Dynamic Neural Networks for Efficient Multimodal Video Understanding

  • Starts: 3:00 pm on Friday, December 3, 2021
  • Ends: 4:00 pm on Friday, December 3, 2021
CISE Seminar & AIR Distinguished Speaker Rogerio Feris, Principal Scientist and Manager, MIT-IBM Watson AI Lab will be presenting. Abstract: The significant growth of multimodal video data in recent years has increased the demand for efficient deep neural network models, particularly in domains where real-time inference is essential. While significant progress has been made on model compression and acceleration for video understanding, most existing methods rely on one-size-fits-all models, which apply the exact same amount of computation for all video segments across all modalities, regardless of their complexity. In this talk, I will cover methods that adaptively change computation depending on the content of the input. First, I will describe a method that dynamically selects the right video frames, at the right level of detail (resolution), to make video understanding more efficient. Then, in the context of audio-visual action recognition, I will present a method that adaptively decides which modality to use for each video segment, with the goal of improving both accuracy and efficiency. Finally, I will discuss ongoing work on adaptive learning for synthetic training data generation.

Back to Calendar