After graduating from the City University of Hong Kong with a master’s degree, Kai Guo (PhD ’11) arrived at Boston University in fall 2007 not quite sure how his PhD studies in a new country would turn out.
He had long aspired to get a doctoral degree and pursue research in areas related to image and video processing and began to work on his doctoral degree in Boston University’s Department of Electrical and Computer Engineering.
Three years later, his decision to attend BU has paid off. After writing a paper based on his research with his advisors, Professors Prakash Ishwar and Janusz Konrad, he took away the Best Paper Prize at the 7th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS) in September.
In their research, Guo had started exploring shapes of human-body silhouettes for action recognition, e.g., whether someone is running, jumping or kicking.
“I was excited to work on action recognition for my doctoral research since it has, in recent years, emerged as a hot and challenging research problem with potential to impact many practical applications – from video surveillance to video search and retrieval,” said Guo.
Konrad said that when Guo began his research, action recognition algorithms were either too complicated to implement or did not perform very well. But when Konrad came across recent work on covariance-descriptors for object localization and tracking by his colleague, Dr. Fatih Porikli from the Mitsubishi Electric Research Laboratory, it immediately occurred to them that his work might be a perfect fit for action recognition.
This hunch paid off. With a little brainstorming and perseverance, Guo soon attained state-of-the-art results.
Ishwar said that they were both surprised and relieved about the outcome. Initially, they had been unsure about what to expect when repurposing the mathematical framework developed for one category of problems to a different set of challenges.
“We knew we were onto something exciting and decided to explore many variations of this approach,” said Ishwar.
Ishwar then encountered a sparse linear representation method for face recognition that was recently developed at UIUC by Professor Yi Ma and his collaborators. Inspired by this work and a tip by Professor Pierre Moulin, Ishwar’s Ph.D. advisor from UIUC, Guo, Ishwar, and Konrad quickly integrated covariance-descriptors with sparsity-based classification to improve upon their early method.
The research soon won its first prize at the 2010 ICPR conference, and the framework proved powerful enough that Guo was able to continue exploring different types of action descriptors and classification mechanisms.
Inspired by fluid mechanics, Guo has since further adapted the covariance descriptors based classification method to make use of features extracted from optical flow, or movement of individual image points. In particular, he looked at the gradient, divergence, and vorticity of the optical flow. The resulting method has outperformed other algorithms on more challenging data sets while achieving higher computational throughput.
As a Dean’s Research Fellow during his first 14 months at BU, Guo explored a wide range of research problems – from unwrapping catadioptric images to robust background subtraction. But it was not until his advisors pointed him to a paper on action recognition that his research really began to take off.
Added Konrad: “It really pays to follow the literature and correlate ideas, in our case covariance-descriptors and sparse linear representations for action recognition. Simple, easy to implement, and computationally efficient, yet very powerful. The rest is history.”