ECE PhD Thesis Defense Zhongkai Shangguan

  • Starts: 11:00 am on Tuesday, February 3, 2026
  • Ends: 12:30 pm on Tuesday, February 3, 2026

ECE PhD Thesis Defense Zhongkai Shangguan

Title: Advancing Assistive Systems with Multimodal Machine Learning

Presenter: Zhongkai Shangguan

Advisor: Prof. Eshed Ohn-Bar

Chair: TBD

Committee: Prof. Eshed Ohn-Bar, Prof. Brian Kulis, Prof. Kayhan Batmanghelich, Prof. Robert Kotiuga

Google Scholar Link: https://scholar.google.com/citations?user=5LiWfk4AAAAJ&hl=en

Abstract: Learning to meaningfully assist people in real-world contexts remains a grand challenge for machine intelligence. Despite recent advances in AI, current systems often struggle to determine when, how, or how much to help in realistic settings. For instance, language models may talk too much, overwhelming users with unnecessary information, or provide confident advice that does not fit the situation or the user’s goals. These failures are further compounded at test time, where systems must operate with missing or noisy data, unfamiliar environments, individual differences among users, and strict run-time constraints.

In this dissertation, we propose a unified framework for building and deploying multimodal (i.e., image, audio, video, language) assistive systems at scale. First, to better align different data modalities in real-world settings, we introduce a novel multimodal pre-training method that leverages semi-supervised learning and cross-modal supervision to improve representation learning and reasoning. The method enables training robust models under realistic-incomplete, noisy, and multimodal-sensory inputs. Second, we develop interactive decision-making models for assistance trained via imitation and reinforcement learning. A key insight is to effectively fuse goal-oriented, temporal cues to better align with user needs and intent, enabling context-aware and adaptive assistance. To further bridge the gap between model capability and practical deployment, we introduce an efficient run-time policy via reinforcement learning that dynamically balances local and cloud computation. The framework balances latency, efficiency, and safety, enabling scalable deployment of multimodal assistive models on resource-constrained mobile platforms.

Location:
PHO 428

Back to Calendar