- Starts: 10:00 am on Tuesday, February 24, 2026
- Ends: 12:00 pm on Tuesday, February 24, 2026
ECE PhD Prospectus Defense: Christopher Liao
Title: Bridging the Embedding Space Gap Between Modalities
Presenter: Christopher Liao
Advisor: Professor Brian Kulis
Chair: Professor Wei-Lun Chao
Committee: Professor Brian Kulis, Professor Venkatesh Saligrama, Professor Kayhan Batmanghelich, Professor Wei-Lun Chao
Google Scholar Profile: https://scholar.google.com/citations?user=iockppwAAAAJ
Abstract: Reliable web-scale text-to-image search systems rely on accurate approximate nearest neighbor search (ANNS) methods such as Inverted File Lists (IVF), Locality Sensitive Hashing (LSH), and nearest neighbor graphs (e.g. HNSW). However, recall of these off-the-shelf methods degrade in multimodal settings, where embeddings from different modalities have disjoint supports. In this prospectus, I will describe our ongoing work on ANNS methods that are robust to the modality gap. This includes paired k-means training to determine better IVF clusters for text-to-image retrieval and constrained hyperplane selection for LSH. We plan to extend these ideas to address the more pronounced embedding gap between query and keys in LLM attention inputs.
While attention matrix sparsity is well established, existing sparse attention methods typically rely on heuristics rather than principled nearest neighbor selection. If an oracle for the top-k keys per query were available, we observe up to a 10× inference speedup with no loss in accuracy. Unfortunately, the large query–key embedding gap causes standard ANNS methods to fail catastrophically. Moreover, nearest neighbor graph–based indexing approaches offer limited practical benefit, as graph construction costs dominate any inference-time savings. Consequently, we expect to see LLM inference speedups from applying our modified IVF and LSH indices to sparsify the attention calculation. Finally, we will apply these methods to long-context multimodal language models, where recent work has shown that existing state-of-the-art static sparse attention mechanisms are insufficient.
- Location:
- PHO 339
