Understanding Language Models by Understanding Training - Naomi Saphra
- Starts: 10:00 am on Monday, February 24, 2025
- Ends: 11:00 am on Monday, February 24, 2025
Abstract: LMs work better than anyone could have predicted just five years ago. But when do they work—and when don’t they? How do they work—and how do they fail? Why do they work—and why do they misbehave? This last question—why?—cannot be answered only by inspecting trained LMs. We must understand the underlying factors that produce LM behavior, an understanding grounded in the training process. For a given architecture, training is a recipe with three ingredients: time, data, and luck. Naomi will discuss these factors through controlled experiments inspecting and manipulating training. These experiments answer fundamental questions about why language models learn. How do training breakthroughs produce language competence? How can training data composition determine model capabilities? And when does output behavior depend on random initialization? Answering these questions, we can expose fundamental truths about why modern deep learning works so well, and even uncover the nature of reasoning itself. Bio: Naomi Saphra is a Research Fellow at the Kempner Institute at Harvard University working to understand NLP training dynamics: how models learn to encode linguistic patterns or other structure, how generalization develops, and how we can introduce useful inductive biases into the training process. She has a particular interest in applying frameworks from evolutionary biology to understand neural networks. Recently, Dr. Saphra has become interested in fish. Previously, she earned a PhD from the University of Edinburgh on Training Dynamics of Neural Language Models and worked at NYU, Google and Facebook. For fun, she writes historical and meta-scientific surveys of the state of machine learning. Outside of research, she plays roller derby under the name Gaussian Retribution, performs standup comedy, and shepherds disabled programmers into the world of code dictation.
- Location:
- CDS 1646
- Registration:
- https://www.bu.edu/cds-faculty/explore/cds-spring-2025-colloquium/