ECE PhD Prospectus Defense: Zhongkai Shangguan
- Starts: 2:00 pm on Tuesday, May 27, 2025
- Ends: 3:30 pm on Tuesday, May 27, 2025
Title: Advancing Assistive Technologies with Multimodal Machine Learning
Presenter: Zhongkai Shangguan
Advisor: Professor Eshed Ohn-Bar
Chair: Professor Brian Kulis Committee: Professor Eshed Ohn-Bar, Professor Brian Kulis, Professor Margrit Betke
Google Scholar Profile: https://scholar.google.com/citations?user=5LiWfk4AAAAJ&hl=en
Abstract: Multimodal models integrate diverse data modalities, including vision, language, audio, motion, and human gaze, to enable unified feature learning by leveraging complementary information from different sensors and sources. While such models can be usable in a variety of assistive contexts, such as assistive navigation or education, incorporating goals and task specifications remains a challenge. Moreover, fusing and aligning the different modalities can be difficult, particularly in complex real-world, spatio-temporal contexts with diverse and missing data. In this work, we develop new models capable of capturing the needs of diverse end-users in real-world contexts. First, we propose a novel vision-language-action model that effectively aligns images with goals in the language space to provide navigation instructions for people with visual impairments. Second, we develop a proactive decision-making framework that continuously aligns user needs, spatio-temporal video cues, historical observations, and safety requirements to realize timely interactions with users. Finally, we develop an effective modality-aware pre-training strategy to capture cross-modal correlations and enable more efficient and robust learning from diverse data, including missing observations. The developed frameworks show promise across domains, including assistive navigation and longitudinal educational modeling.
- Location:
- PHO 339, 8 St Mary's St