When watching a video, children have no problem recognizing that a person is walking, jumping or waving. Give the video footage to a computer, however, and the same task becomes daunting.
For a computer, different actions such as walking and running may look very similar due to the camera viewing angle and frame-rate. Another challenging problem is action variability: the same action performed by different people may look quite different to a computer. For example, people have different walking gaits.
Over the last decade, engineers have tackled this problem with limited success. No satisfactory system has been developed so far, especially for low resolution video, but Kai Guo (PhD ’11) and Professors Prakash Ishwar and Janusz Konrad are working to change that.
They developed a new action recognition algorithm that exceeds the performance of state-of-the-art methods and is suitable for real-time use due to low storage and computational requirements. The algorithm is unique in that it marries features developed for object-tracking with a recently proposed classification framework based on ideas from the field of compressive sampling.
Their algorithm consistently achieved such a high performance on several databases that Guo, Ishwar, and Konrad were invited to enter the “Aerial View Activity Classification Challenge” in the Semantic Description of Human Actions (SDHA) contest during the 2010 International Conference on Pattern Recognition (ICPR).
The goal of the challenge was to test methodologies with realistic surveillance-type videos particularly from low-resolution, far-away cameras. Eight teams entered the contest, and in the final stage Guo, Ishwar, and Konrad edged out a team from the University of Modena, Italy, to win the challenge.
“We are far from a satisfactory understanding of why and when our method works from a machine-learning perspective. There are plenty of other real-world engineering and algorithmic challenges to overcome,” said Ishwar, “but the fact that our method has performed so well consistently across several datasets, including the low-resolution SDHA dataset, is exciting.”
Many fields could greatly benefit from a fast and accurate solution including homeland security, healthcare, ecological monitoring, and automatic sign-language recognition for assisting the hearing-impaired.
Konrad, who presented the paper titled “Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels” at ICPR, is cautiously optimistic saying, “Although we are very happy to have won the contest, in very difficult real-world scenarios our algorithm still misses about five times out of 100. This is not acceptable in practice but we are hopeful of making further progress soon.”