Beyond Model Acceleration in Next-Generation Machine Learning Systems - Wenqi Jiang
- Starts: 10:00 am on Monday, January 27, 2025
- Ends: 11:00 am on Monday, January 27, 2025
Title:
Beyond Model Acceleration in Next-Generation Machine Learning Systems
Abstract: Despite the recent popularity of large language models (LLMs), the transformer neural network invented eight years ago has remained largely unchanged. It prompts the question of whether machine learning (ML) systems research is solely about improving hardware and software for tensor operations. In this talk, Wenqi Jiang will argue that the future of machine learning systems extends far beyond model acceleration. Using the increasingly popular retrieval-augmented generation (RAG) paradigm as an example, Wenqi will show that the growing complexity of ML systems demands a deeply collaborative effort spanning data management, systems, computer architecture, and ML. Wenqi will present RAGO and Chameleon, two pioneering works in this field. RAGO is the first systematic performance study of retrieval-augmented generation. It uncovers the intricate interactions between vector data systems and models, revealing drastically different performance characteristics across various RAG workloads. To navigate this complex landscape, RAGO introduces a system optimization framework to explore optimal system configurations for arbitrary RAG algorithms. Building on these insights, Wenqi will introduce Chameleon, the first heterogeneous accelerator system for RAG. Chameleon combines LLM and retrieval accelerators within a disaggregated architecture. The heterogeneity ensures efficient serving of both LLM inference and retrievals, while the disaggregation enables independent scaling of different system components to accommodate diverse RAG workload requirements. Wenqi will conclude the talk by emphasizing the necessity of cross-stack co-design for future ML systems and the abundant of opportunities ahead of us. Bio: Wenqi Jiang is a fifth-year PhD student at ETH Zurich, advised by Gustavo Alonso and Torsten Hoefler. He aims to enable more efficient, next-generation machine learning systems. Rather than focusing on a single layer in the computing stack, Wenqi's research spans the intersections of data management, computer systems, and computer architecture. His work has driven advancements in several areas, including retrieval-augmented generation (RAG), vector search, and recommender systems. These contributions have earned him recognition as one of the ML and Systems Rising Stars, as well as the AMD HACC Outstanding Researcher Award. See https://wenqijiang.github.io/ for more information.
- Location:
- CDS 1646