Research Spotlight Archive

Back to Research Spotlight

Title: GestureMouse

Participants: Kam Lai (ECE ’12), Kai Guo (PhD ’11), Professor Janusz Konrad (ECE), and Professor Prakash Ishwar (ECE)

Funding: Kenneth R. Lutchen Distinguished Fellowship Program, College of Engineering

Status: Ongoing (2011-Present)

Background: In November 2010, Microsoft launched Kinect for Xbox, an infrared-light, range-sensing camera with only one Kinect-compatible game. A suicide or smart strategy? The Kinect turned out to be an immediate hit and countless hackers as well as academicians started exploring its potential. When, in June 2011, Microsoft released a Kinect Software Development Kit (SDK), which includes a set of powerful algorithms for extracting scene depth and body silhouettes, subsequently building a skeleton model of a person in real time, the floodgates opened.

Typical output produced by Kinect camera indoors

Although Kinect’s target application is video gaming, the 3-D capture and robustness of its SDK have helped spawn numerous research and “hacking” projects in the field of human-computer interfaces. Most approaches, however, rely on tracking not recognition. Usually a set of parameters, e.g., thresholds on joint locations, velocities and accelerations, are specified to localize and track movements. For example, a number of applications have been developed using USC’s Flexible Action and Articulated Skeleton Toolkit (FAAST). Although many approaches have been shown to work well in scripted scenarios (impressive videos on YouTube), one would expect difficulties when users with very different gaits are considered. Furthermore, using a fixed set of parameters (thresholds) makes the extension to new gestures difficult. We believe that gesture recognition based on machine learning algorithms with suitably selected features and representations are likely to be more flexible and robust.

Summary: We developed a real-time human-computer interface that uses recognition of  hand gestures captured by a Kinect camera. Gestures can be predefined or the system can be trained by the user with his/her own gestures. The interface works reliably in real time on a modest PC. However, since the Kinect uses infrared light sources and has a reliable operating range from about 0.8m to 4m, our system works well only in indoor environments without very strong lighting. The developed algorithms are supervised in the sense that we use a dictionary of labeled hand gestures against which we compare a query gesture.

We use Kinect’s SDK to extract a time sequence of skeleton joints that together form a “bag of features.” We capture the statistics of these features in an empirical covariance matrix, and since a covariance matrix does not lie in a Euclidean space, we transform it to such a space by taking the matrix logarithm. For efficiency, we use the nearest-neighbor classifier with a metric given by the Frobenius norm of the difference between log-covariance matrices. The covariance matrix, matrix logarithm, and distance metric calculations are of  low-enough computational complexity that for hand-gesture dictionaries of modest size they can be computed in real time (30 frames/sec) on a modest computer.

GestureMouse demonstration screenshot

GestureMouse demonstration screenshot

Results: In a leave-one-out cross-validation test (LOOCV) using 8 hand gestures performed by 20 individuals, our method attains over 97% correct-classification rate (CCR) when removing all gestures of a user and over 99% CCR when removing one realization (video) of a gesture for a user. In practice, the algorithm works very well on new users who have not been used for training and requires no tuning. Based on this algorithm, we developed GestureMouse, a complete real-time system in C#, that permits a user to:

  • move icons left by swinging the right arm to left,
  • move icons right by swinging the left arm to right,
  • select icons (down one level) by jabbing at the screen with either the left or right arm,
  • go up one level by executing “come back” gesture with either arm,
  • zoom-in by diagonally moving two hands outwards, and
  • zoom-out by horizontally moving two hands inwards.

GestureMouse has been accepted for Kinect demonstration at CVPR’12 in June 2012.

Publications:

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36