ECE PhD Prospectus Defense: Zhongkai Shangguan

Starts: 2:00 pm on Tuesday, May 27, 2025
Ends: 3:30 pm on Tuesday, May 27, 2025

Title: Advancing Assistive Technologies with Multimodal Machine Learning

Presenter: Zhongkai Shangguan

Advisor: Professor Eshed Ohn-Bar

Chair: Professor Brian Kulis Committee: Professor Eshed Ohn-Bar, Professor Brian Kulis, Professor Margrit Betke

Google Scholar Profile: https://scholar.google.com/citations?user=5LiWfk4AAAAJ&hl=en

Abstract: Multimodal models integrate diverse data modalities, including vision, language, audio, motion, and human gaze, to enable unified feature learning by leveraging complementary information from different sensors and sources. While such models can be usable in a variety of assistive contexts, such as assistive navigation or education, incorporating goals and task specifications remains a challenge. Moreover, fusing and aligning the different modalities can be difficult, particularly in complex real-world, spatio-temporal contexts with diverse and missing data. In this work, we develop new models capable of capturing the needs of diverse end-users in real-world contexts. First, we propose a novel vision-language-action model that effectively aligns images with goals in the language space to provide navigation instructions for people with visual impairments. Second, we develop a proactive decision-making framework that continuously aligns user needs, spatio-temporal video cues, historical observations, and safety requirements to realize timely interactions with users. Finally, we develop an effective modality-aware pre-training strategy to capture cross-modal correlations and enable more efficient and robust learning from diverse data, including missing observations. The developed frameworks show promise across domains, including assistive navigation and longitudinal educational modeling.

Location:: PHO 339, 8 St Mary's St

Back to Calendar

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31