From Pose to Action: Semantic Understanding of Human Motion from Video: Dr. Leonid Sigal, Disney Research Pittsburgh (IVC)

11:00 am on Friday, April 19, 2013
12:00 pm on Friday, April 19, 2013
MCS 137
Abstract: In this talk, I will focus on a few recent projects that deal with human pose and activity understanding from video. I will first talk about marker-less motion capture. We estimate human motion from monocular video by recovering three-dimensional controllers capable of implicitly simulating the observed human behavior and replaying this behavior in other environments and under physical perturbations. Our approach employs a state-space biped controller with a balance feedback mechanism that encodes control as a sequence of simple control tasks. Transitions among these tasks are triggered on time and on proprioceptive events (e.g., contact). Inference takes the form of optimal control where we optimize a high-dimensional vector of control parameters and the structure of the controller based on an objective function that compares the resulting simulated motion with input observations. We illustrate our approach by automatically estimating controllers for a variety of motions directly from monocular video. I will then focus on action recognition. We make an observation that actions can typically be recognized from a very sparse sequence of temporally local discriminative keyframes -- collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of the keyframes in a max-margin discriminative framework, where we treat keyframes as latent variables. This allows us to (jointly) learn a set of most discriminative keyframes while also learning the local temporal context between them. Keyframes are encoded using a spatially-localizable poselet-like representation with HoG and BoW components learned from weak annotations. Resulting model allows spatio-temporal localization of actions and gives competitive performance on UT-Interactions dataset. Time permitting, I will quickly show a few other examples of recent projects in this space of human pose and activity understanding. Projects above are joint work with Marek Vondrak (Brown University), Michalis Raptis (Disney Research), Jessica Hodgins (CMU/Disney Research), Chad Jenkins (Brown University). Bio: Leonid Sigal is a Research Scientist at Disney Research Pittsburgh and an adjunct faculty at Carnegie Mellon University. Prior to this he was a postdoctoral fellow in the Department of Computer Science at University of Toronto. He completed his Ph.D. at Brown University in 2008; he received his B.Sc. degrees in Computer Science and Mathematics from Boston University (1999), his M.A. from Boston University (1999), and his M.S. from Brown University (2003). From 1999 to 2001, he worked as a senior vision engineer at Cognex Corporation, where he developed industrial vision applications for pattern analysis and verification. Leonid's research interests mainly lie in the areas of computer vision, machine learning, and computer graphics, but also borderline fields of psychology and humanoid robotics. His current research spans articulated pose estimation, action recognition, domain adaptation, latent variable models, data-driven simulation, controller design for animated characters and perception of human motion.