Multi-Modal Embodied Visual Learning

Kristen Grauman, Professor, Department of Computer Science, University of Texas at Austin

When:
10:30am – 11:00am Networking & refreshments
11:00am – 12:00pm Talk

Where:
Kilachand Center, Colloquium Room 101
610 Commonwealth Ave, Boston, MA 02215

Abstract:
Computer vision has seen major success in learning to recognize objects from massive “disembodied” Web photo collections labeled by human annotators. Yet cognitive science tells us that perception is multi-modal, it develops in the context of acting the world, and it proceeds without intensive supervision. Meanwhile, many realistic vision tasks require not only categorizing a well-composed human-taken photo, but also actively deciding where to look in the first place. In the context of these challenges, we are exploring how machine perception benefits from anticipating the sights and sounds an agent will experience as a function of its own actions. Based on this premise, we introduce methods for learning to navigate intelligently in novel environments, learning from video about the affordances of objects, and analyzing audio-visual streams for both semantic and spatial context. Together, these are steps towards first-person perception, where interaction with the world is itself a supervisory signal.

Bio: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist at Facebook AI Research. Her research in computer vision and machine learning focuses on visual recognition and search. Before joining UT Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and a recipient of the NSF CAREER, ONR YIP, PECASE, PAMI Young Researcher award, and the 2013 IJCAI Computers and Thought Award. She and her collaborators have been recognized with several best paper awards, including a 2017 Helmholtz Prize “test of time” award. She served as a Program Chair of the Conference on Computer Vision and Pattern Recognition (CVPR) in 2015 and Neural Information Processing Systems (NeurIPS) in 2018, and she currently serves as Associate Editor-in-Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI).