CDS Spring 2024 Colloquium
The CDS Colloquium Series was developed to build intellectual community within and beyond the academic unit. Since its inception, the series has welcomed dozens of scholars and is intended for CDS faculty, staff, and students. However, we welcome interest from across the Boston University campus and beyond.
To view all upcoming lectures, events, and programs, visit the CDS Calendar.
Past Talks
Wednesday April 10, 2024
Learning optimal supervised representations with the decodable information bottleneck
David J. Schwab is Associate Professor of Biology and Physics at the Graduate Center, CUNY and a member of the Initiative for the Theoretical Sciences.
Time: 10:00am | Location: CDS 1646, 665 Commonwealth Ave., Boston
Abstract: I will discuss the question of characterizing and learning optimal representations for supervised learning. Traditionally, this has been tackled using the Information Bottleneck (IB), which compresses inputs while retaining information about targets, in a decoder-agnostic fashion. In machine learning, however, our goal is generalization, which is intimately linked to the decoder of interest. After discussing two phases of learning dynamics inspired by previous work on IB, I will present the Decodable Information Bottleneck (DIB), an objective that considers information retention and compression from a decoder aware perspective. As a result, DIB gives rise to representations that are optimal in terms of expected test performance. Empirically, DIB can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks. I will then turn to out-of-distribution generalization where work has demonstrated that deep neural networks trained using Empirical Risk Minimization (ERM) can generalize under distribution shift, outperforming specialized training algorithms for domain generalization. To understand this, I will discuss the extent to which domain adaptation theory explains the performance of ERMs. Surprisingly, we find that this theory does not provide a tight explanation for the observed out-of-domain generalization and is outperformed by other heuristic measures.
Bio: David J. Schwab is a Simons Investigator in the Mathematical Modeling of Living Systems and a Sloan Fellow in Physics. He is interested in the physics of learning, natural and artificial.
Monday, April 1 , 2024
Information-theoretic Approaches to Neuroscience and Neuroengineering: Inferring Information Representation and Communication in the Brain
Venkatesh is a Shanahan Foundation Fellow at the Allen Institute and the University of Washington.
Time: 10:00am | Location: CDS 1646, 665 Commonwealth Ave., Boston
Abstract: Advances in neurotechnology are enabling us to simultaneously record thousands of neurons from multiple brain regions, allowing us to probe neural circuits at unprecedented resolutions. These advances create exciting opportunities for studying how different brain regions encode and communicate information, and call for theoretical and computational tools capable of obtaining insights from high-dimensional datasets. Venkatesh will discuss how information theory can bring a new perspective to our understanding of information representation and communication across different regions of the brain, as well as inform the design of new neural interfaces to extract such information.
To understand how the brain encodes information, we need measures that can quantify the extent of redundant or unique information between different brain regions. Venkatesh will describe how measures of uniqueness and redundancy can be defined, provide the first efficient and scalable estimators of these measures for high-dimensional neural data, and show how they can give rise to new neuroscientific insights. He will also discuss a new theoretical framework for tracking the information flows of "messages" such as those related to stimuli or behavior, and show the advantages of using such a framework to understand communication in the brain, such as in networks with feedback. Venkatesh will also touch upon how the same measures of information flow and unique information can be used to quantify and remove bias in the field of fair machine learning.
Finally, he will describe my contributions to deriving the fundamental limits of the spatial resolution of Electroencephalography, and my collaborations with neuroscientists to validate this theory experimentally. The understanding Venkatesh developed has led to advances in the non-invasive detection of cortical spreading depolarizations, as well as to the development of new EEG electrodes and subdermal EEG probes. He will conclude with an overview of my future research vision.
Bio: Venkatesh holds a BTech (with Honors) in Electrical Engineering from the Indian Institute of Technology, Madras, and a PhD in Electrical and Computer Engineering from Carnegie Mellon University.
Praveen's research applies information theory, machine learning and statistics to neuroscience and neuroengineering. He has developed new frameworks for information flow, measures for information representation, and advanced our understanding of the fundamental limits of electroencephalography. These contributions have found applications in neuroscientific data analysis, the development of new neural interfaces, the diagnosis of brain diseases, and in fair machine learning. Through these projects, Praveen has led interdisciplinary collaborations with neuroscientists, systems and circuits engineers, and clinicians. His theoretical, computational and impact-oriented contributions have appeared in prestigious venues, such as the IEEE Transactions on Information Theory, the Proceedings of the IEEE, NeurIPS, and AAAI (for theory and ML work), and the Journal of Neural Engineering, the IEEE Transactions on Biomedical Engineering, Scientific Reports, and IEEE BioCAS (for impact-oriented work). His efforts have earned him numerous fellowships, including a Carnegie Institute of Technology Dean's Fellowship, a Henry L. Hillman Presidential Fellowship, a Dowd Fellowship, a Fellowship in Digital Health from the Center for Machine Learning and Health at CMU, and a Shanahan Foundation Fellowship at the Interface of Data and Neuroscience.
Monday, March 25, 2024
Control Mechanisms for Generative Models
John Thickstun is a postdoctoral researcher at Stanford University advised by Percy Liang.
Time: 10:00 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave., Boston
Abstract: The compelling outputs of modern generative models have motivated widespread efforts to deploy these models as tools in commercial products and user-facing applications. For a model to be an effective tool, we must be able to control its outputs. Users of a model require control to instruct the model to generate outputs that meet their needs. A model provider also requires control, to oversee and regulate the behavior of the model. In this talk, Thickstun will describe his work on methods that improve the controllability of generative models. First, he will describe the Anticipatory Music Transformer: a generative model of music with a novel control mechanism called anticipation, which allows a user to apply fine-grained control while generating long sequences of music. Second, Thickstun will show how to control a pre-trained model of images or audio to generate outputs subject to constraints, and apply this method to source separation: “the cocktail party problem.” Third, he will demonstrate a mechanism for watermarking the outputs of a language model, without altering the distribution of text generated by the model. He will conclude with a vision for future work on controllable generation, and ways that more nuanced control mechanisms can facilitate the use and governance of generative models.
Bio: Before Standford, Thickstun completed a PhD at the University of Washington, advised by Sham M. Kakade and Zaid Harchaoui. John is interested in developing principled methods that advance the capabilities and controllability of generative models. His work has been featured in media outlets including TechCrunch and the Times of London, recognized by outstanding paper awards at NeurIPS and ACL, and supported by an NSF Graduate Fellowship and a Qualcomm Innovation Fellowship.
Wednesday March 20, 2024
A Step Further Toward Scalable and Automatic Distributed Large Language Model Pre-training
Hongyi Wang is a Senior Project Scientist at the Machine Learning Department of CMU working with Prof. Eric Xing.
Time: 10:00-11:15 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave., Boston
Abstract: Large Language Models (LLMs) are at the forefront of advances in the field of AI. Nonetheless, training these LLMs is computationally daunting and necessitates distributed training methods. However, distributed training generally suffers from bottlenecks, including heavy communication costs and the need for extensive performance tuning. Hybrid parallelism, which combines data and model parallelism, is essential for LLM pre-training. Designing effective hybrid parallelism strategies, though, requires a substantial tuning effort and specialized expertise. In this talk, Wang will first discuss how to automatically design high-throughput hybrid-parallelism training strategies using system cost models. Then, he will demonstrate the use of these automatically designed hybrid parallelism strategies to train state-of-the-art LLMs from scratch. Finally, Wang will introduce a low-rank training framework to enhance communication efficiency in data parallelism. This proposed framework achieves almost ideal scalability without sacrificing model quality by leveraging a full-rank to low-rank training strategy and a layer-wise adaptive rank selection mechanism.
Bio: Wang obtained his Ph.D. degree from the Department of Computer Sciences at the University of Wisconsin-Madison, where he was advised by Prof. Dimitris Papailiopoulos. Dr. Wang received the Rising Stars Award from the Conference on Parsimony and Learning in 2024 and the Baidu Best Paper Award at the Spicy FL workshop at NeurIPS 2020. He led the distributed training effort of LLM360, an academic research initiative advocating for fully transparent open-source LLMs. His research has been adopted by companies like IBM, Sony, and FedML Inc., and he is currently funded by NSF, DARPA, and Semiconductor Research Corporation.
Tuesday, March 5, 2024
Transformers and Attention
Time: 1:00-2:00 PM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave., Boston
Scott Ladenheim is a senior research computing applications and data specialist at BU for Research Computing Services (RCS).
Abstract: In this lecture, Ladenheim will introduce the transformer architecture. This architecture forms the building blocks of large language models like ChatGPT. The transformer architecture consists of encoder and decoder blocks. The focus of the talk will be on the layers within the encoder blocks. In particular, he will discuss the attention layer and how attention allows the model to understand relationships between words in a sentence. This lecture assumes students have knowledge of linear algebra and the following topics in deep learning: neural networks, recurrent neural networks, and word embeddings. This mock lecture is intended for an audience in DS-340 Introduction to Machine Learning and AI.
Bio: Ladenheim is originally from Syracuse, New York and has an educational background in mathematics; earning his PhD degree from Temple University in 2015. After his PhD, he worked as a postdoctoral research assistant at the University of Manchester in the UK where he built tools to model heat flow in computer circuits. Following his postdoc, he worked for a startup company called Next Ocean in the Netherlands. At Next Ocean, Ladenheim developed software to predict future ship motions based on a time series history of ship motion and radar data. In 2021, he returned to the US. Prior to joining RCS he worked for a data analytics/AI division within Capgemini Engineering.
Wednesday, February 28, 2024
Accelerating Drug Discovery by Generative and Geometric AI
Time: 10:00-11:15 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Wengong Jin is a postdoctoral fellow in the Eric and Wendy Schmidt Center at Broad Institute.
Abstract: AI for drug discovery is an emerging field that aims to computationally design new proteins or molecules with desired properties. Traditional experimental approaches to drug discovery are time-consuming and labor-intensive, due to the large combinatorial search space of molecule and protein structures. In this talk, Jin will describe how to accelerate drug discovery via novel generative and geometric deep learning methods. First, I will introduce junction tree variational autoencoder (JT-VAE), a generative model for molecular graphs. Inspired by probabilistic graphical models, JT-VAE leverages the low tree-width of molecular graphs and represents a molecule as a junction tree of chemical motifs. Second, they will present Neural Euler's Rotation Equation (NERE), an equivariant rotation prediction network inspired by rigid-body dynamics. Based on NERE, they developed an unsupervised binding energy prediction method that estimates the likelihood of a protein complex via SE(3) denosing score matching. Lastly, Jin will demonstrate how these algorithmic innovations make real-world impacts on drug discovery. Through collaboration with biologists in wet labs, they successfully designed new antibiotics to fight against antimicrobial resistance and new antibodies with potential for cancer immunotherapy.
Bio: Previously, Jin obtained his PhD at MIT CSAIL, advised by Prof. Regina Barzilay and Prof. Tommi Jaakkola. His research focuses on machine learning for drug discovery. He is particularly interested in developing geometric deep learning and generative AI models for virtual drug screening, de novo drug design, antibody design, and protein-ligand/protein binding. His work was published in leading AI conferences and biology journals like ICML, NeurIPS, ICLR, Nature, Science, Cell, and PNAS. Jin's research received extensive media coverage including Guardian, BBC News, CBS Boston, and Financial Times. He is the recipient of the BroadIgnite Award, Dimitris N. Chorafas Prize, and MIT EECS Outstanding Thesis Award.
Thursday, February 22, 2024
Foundations of Multisensory Artificial Intelligence
Time: 10:00 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Paul Liang is a final-year Ph.D. student in the Machine Learning Department at Carnegie Mellon University whose research lies in the foundations of multimodal machine learning with applications in socially intelligent AI, natural language processing, healthcare, and education.
Abstract: Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, audio, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents.
In this talk, Paul will discuss his research on the machine learning principles of multisensory intelligence, as well as practical methods for building multisensory foundation models over many modalities and tasks. In the first half, he presents a new theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets and design principled approaches to learn these interactions. In the second part, he will present his work in cross-modal attention and multimodal transformer architectures that now underpin many of today’s multimodal foundation models. Finally, Paul will discuss his collaborative efforts in scaling AI to many modalities and tasks for real-world impact: (1) aiding mental health practitioners by predicting daily mood fluctuations in patients using multimodal smartphone data, (2) supporting doctors in cancer prognosis using histology images and multiomics data, and (3) enabling robust control of physical robots using cameras and touch sensors.
Bio:Paul Liang is a Ph.D. student in Machine Learning at CMU, advised by Louis-Philippe Morency and Ruslan Salakhutdinov. He studies the machine learning foundations of multisensory intelligence to design practical AI systems that integrate, learn from, and interact with a diverse range of real-world sensory modalities. His work has been applied in affective computing, mental health, pathology, and robotics. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper/honorable mention awards at ICMI and NeurIPS workshops. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for instructing courses on multimodal ML and advising students around the world in directed research.
Friday, February 23, 2024
Digital Safety and Security for Survivors of Technology-Mediated Harms
Time: 11:00 AM-12:15 PM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Emily Tseng is a PhD candidate in Information Science at Cornell University.
Abstract: Platforms, devices, and algorithms are increasingly weaponized to control and harass the most vulnerable among us. Some of these harms occur at the individual and interpersonal level: for example, abusers in intimate partner violence (IPV) use smartphones and social media to surveil and stalk their victims. Others are more subtle, at the level of social structure: for example, in organizations, workplace technologies can inadvertently scaffold exploitative labor practices. This talk will discuss Tseng's research (1) investigating these harms via online measurement studies, (2) building interventions to directly assist survivors with their security and privacy; and (3) instrumenting these interventions, to enable scientific research into new types of harms as attackers and technologies evolve. She will close by sharing my vision for centering inclusion and equity in digital safety, security and privacy, towards brighter technological futures for us all.
Bio: Tseng's research explores the systems, interventions, and design principles we need to make digital technology safe and affirming for everyone. Emily’s work has been published at top-tier venues in human-computer interaction (ACM CHI, CSCW) and computer security and privacy (USENIX Security, IEEE Oakland). For 5 years, she has served as a researcher-practitioner with the Clinic to End Tech Abuse, where her work has enabled specialized security services for over 500 survivors of intimate partner violence (IPV). Emily is the recipient of a Microsoft Research PhD Fellowship, Rising Stars in EECS, Best Paper Awards at CHI, CSCW, and USENIX Security, and third place in the Internet Defense Prize. She has additionally completed internships at Google and with the Social Media Collective at Microsoft Research. She holds a B.A. from Princeton University.
Monday, February 5, 2024
A Radical New Future for (Astro)physics Enabled by AI
Time: 10:00-11:15 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Siddharth Mishra-Sharma is a Fellow at The NSF AI Institute for Artificial Intelligence and Fundamental Interactions (IAIFI), working between MIT and Harvard.
Abstract: AI for science is currently at an inflection point, with significant potential for impacting domain sciences and spurring methodological advances in AI. Astrophysics is a prime example of this bidirectional synergy. The next several years will witness an influx of data that will enable us to map out the Universe to unprecedented precision, and while these observations will have significant discovery potential, the complexity of the data and underlying physical models presents novel challenges. Siddharth will describe how maximizing the scientific return of these observations over a wide range of scales and modalities will require a qualitative shift in how we interact with the data, bringing together advances in probabilistic machine learning with mechanistic modeling. While showcasing domain applications within astrophysics, he will highlight how the unique nature of astrophysical data motivates the advancement of machine learning methods with broad relevance to the physical sciences and beyond.
Bio: Siddharth is broadly interested in the application and development of machine learning methods motivated by problems in the physical sciences, with a particular focus on astrophysics. Previously, he was a postdoc at NYU’s Center for Cosmology and Particle Physics and obtained his Ph.D. in Physics from Princeton University.
Wednesday, February 7, 2024
Decoding Abusive Adversaries for Safer Digital Systems
Time: 10:00-11:15 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Rosanna Bellini is a Postdoctoral Associate at Cornell Tech in New York City.
Abstract: People today face threats to their digital safety that most computing systems were never designed to protect them from: those closest to them. Abusive adversaries take ample advantage of standard user interfaces and ineffective anti-abuse mechanisms, leveraging their close social and physical proximity to their target to stalk, harass, and control. In this talk, Rosanna Bellini is a Postdoctoral Associate at Cornell Tech in New York City, will describe her research focused on intimate partner violence where I: (1) pioneer approaches to engaging with abusive adversaries first hand across online and in-person contexts, (2) design and deploy bespoke systems to challenge abusive behaviors via community-based interventions; and (3) develop new frameworks for building abuse-resilient technologies. Bellini will outline her research vision to achieve digital safety for all people across critical domains, including finance, healthcare, and research.
Bio: Bellini's research develops data-driven and engaged research methods to tackle complex societal challenges, such as technology-enabled harms. Her work has been published in top-tier human-computer interaction (HCI) and computer security venues, including USENIX Security, IEEE S&P, CHI, and CSCW, and featured on the BBC World Service. She has received multiple Best Paper awards from CHI and CSCW, as well as Distinguished Paper awards from USENIX Security. Her research has helped to prompt legislative changes and improvements to consumer-facing financial applications, benefiting tens of millions of customers. She also helps to lead the Clinic to End Tech Abuse, a frontline service for survivors of technology-facilitated abuse, and has personally helped over 150 survivors reclaim their privacy, security, and financial freedom.
Friday, February 9, 2024
Instance-Optimization: Rethinking Database Design for the Next 1000X
Time: 10:00-11:15 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Jialin Ding is an Applied Scientist at Amazon Web Services.
Abstract: Modern database systems aim to support a large class of different use cases while simultaneously achieving high performance. However, as a result of their generality, databases often achieve adequate performance for the average use case but do not achieve the best performance for any individual use case. In this talk, Ding will describe his work on designing databases that use machine learning and optimization techniques to automatically achieve performance much closer to the optimal for each individual use case. In particular, he will present his work on instance-optimized database storage layouts, in which the co-design of data structures and optimization policies improves query performance in analytic databases by orders of magnitude. Ding will highlight how these instance-optimized data layouts address various challenges posed by real-world database workloads and how he implemented and deployed them in production within Amazon Redshift, a widely-used commercial database system.
Bio: Prior to AWS, Ding received his PhD in computer science from MIT, advised by Tim Kraska. He works broadly on applying machine learning and optimization techniques to improve data management systems, with a focus on building databases that automatically self-optimize to achieve high performance for any specific application. His work has appeared in top conferences such as SIGMOD, VLDB, and CIDR, and has been recognized by a Meta Research PhD Fellowship. To learn more Ding here.
Wednesday, February 21, 2024
The Simplest Neural Models, and a Hypothesis for Language in the Brain
Time: 10:00 AM | Location: CDS 1646 (in-person event) | 665 Commonwealth Ave, Boston
Dan Mitropolsky is a final-year Ph.D. student at Columbia University. His main interest is the computational and mathematical theory of the brain, especially understanding the brain's algorithms behind language and possible applications to AI. In complexity theory, he also works on the theory of total functions and their connection to cryptography.
Abstract: How would a computer scientist go about understanding the brain? As Nobel laureate Richard Axel recently put it, "We do not have a logic for the transformation of neural activity into thought and action; discerning this logic as the most important future direction of neuroscience". In this talk, Dan will consider how one would come up with a computational, mathematical model of the brain, and define a neural model, or NEMO, whose key ingredients are spiking neurons, random synapses and weights, local inhibition, and Hebbian plasticity (no backpropagation). Concepts are represented by interconnected co-firing assemblies of neurons that emerge organically from the dynamical system of its equations. It turns out it is possible to carry out complex operations on these concept representations, such as copying, merging, completion from small subsets, and sequence memorization. This opens up a "computer science of the brain", in which one can study algorithms and complexity in an unusual, but biologically-motivated, model of computation. Dan will present how to use NEMO to implement an efficient parser of a small but non-trivial subset of English, and a more recent model of the language organ in the baby brain that learns the meaning of words, and basic syntax, from whole sentences with grounded input. In addition to constituting hypotheses as to the logic of the brain, Dan will discuss how principles from these brain-like models might be used to improve AI, which, despite astounding recent progress, still lags behind humans in several key dimensions such as creativity, hard constraints, and energy consumption. The talk is intended for a computing audience and does not assume any background in neuroscience.
Bio: Dan Mitropolsky is a PhD student at Columbia University, advised by Christos Papadimitriou and Tal Malkin. His main research interest is developing a computer science theory of the brain, especially understanding the brain's algorithms behind language and possible applications to AI. He is also a complexity theorist and works on the theory of total functions and their connection to cryptography. He has published his work at NLP, AI, neuroscience, and theoretical CS venus (e.g. TACL, AAAI, PNAS, CCNeuro, ITCS), and has been selected as a finalist in the upcoming 3 Minute Thesis (3MT) competition in March. In addition to computer scientists, Dan has collaborated with neuroscientists and cognitive scientists at Columbia, CUNY, The Rockefeller University, and Google, and will be a full-time participant in the upcoming summer on "AI, Psychology, and Neuroscience" at Berkeley's Simons Institute. Dan is also excited about teaching; he has taught NLP and Advanced Cryptography while a PhD student at Columbia. Before the PhD, he worked at Google and completed a double B.S. in Mathematics and Computer Science at Yale. In addition to his research, Dan is deeply passionate about languages; he is fluent in 13 languages and is currently trying to add 2 more (and loves an opportunity to practice any of them).