Computing and Data Science PhD Student Seminar Series
The Boston University PhD program is home to a wide range of students, all studying various facets of data science. To help give students a friendly opportunity to practice and develop their research skills, we are launching the Computing and Data Science PhD Student Seminar Series. This series is focused on allowing doctoral students to present their research within a supportive and collaborative environment. Each seminar offers students a chance to share their findings, practice presentation skills, and receive constructive feedback from peers and faculty in a friendly, non-judgmental setting. This format not only helps students refine their work but also fosters essential communication skills that are crucial for their academic and professional careers.
Beyond the academic benefits, the seminar series is a community-building endeavor that seeks to strengthen connections among CDS students. By creating a space for students to share their work with the public, students from various backgrounds can learn from each other's experiences and methodologies.
The seminar series, organized by students Freddy Reiber, Lingyi Xu, and Yan (Stella) Si, meets weekly throughout the year on Fridays from noon to 1 PM, with lunch during the talk. Students interested in giving a talk should reach out to the organizers through email.
You can also view more details at the link here.
Current Talks
We are planning for the spring semester. Check back soon for more details.
Past Talks
Fall 2025
PhD Seminar Series Holiday Party
December 12, 12 PM - CDS 1635
Abstract: End of the Fall semester - Hooray! Join us for a holiday celebration to wrap up another successful semester of student research presentations.
WinoReferral: A Benchmark to Evaluate LLM Responses to Prompts Implying Mental Health Disorders with Micah Benson
December 5, 12 PM - CDS 1646
Abstract: The use of LLMs has contributed to serious mental health conditions among a subset of users, most alarmingly cases of delusional thinking and suicide. My talk will begin by surveying leading theories from psychology and human computer interaction that explain how LLM use interacts with user mental health. I will then present my research on evaluating LLM responses to prompts that imply a user has depression and discuss our current results, which indicate high variance in AI companies’ safety policies for referring users to mental health professionals. I will conclude by proposing policy changes that could limit the harms of LLMs on user mental health.
Bio: Micah studies the societal impacts of large language models (LLMs) as a PhD Student at Boston University's Faculty of Computing & Data Sciences. He uses interpretability methods to investigate how LLMs represent social concepts such as identity and politics, with the goal of developing techniques to improve model fairness. He also conducts audits that simulate new uses of LLMs to analyze potential benefits and risks of the technology. Before BU, Micah graduated from WashU with a double major in data science and English.
Bayesian Predictive Modeling: Towards Martingale Posterior Distributions for Dynamical Systems with Clark Ikezu
November 21, 12 PM - CDS 1646
Abstract: Bayesian inference is a principled way to quantify uncertainty over parameters. The predominant approach involves specifying a prior and a likelihood in order to compute a posterior distribution. Prediction is then achieved through computing the posterior predictive distribution. However, prediction can also be viewed as the primary task of Bayesian inference, in which specifying a predictive model comes first, and inferring the posterior distribution follows next. This approach is appealing in several ways, including that one reasons over quantities we can observe, as opposed to parameters that cannot be observed. In this talk I will introduce this Bayesian "predictive approach", and discuss a particular method called the martingale posterior distribution implemented by the predictive resampling algorithm. Next I will present preliminary work in which I show how the predictive resampling algorithm can be useful for posterior inference in the setting of non-i.i.d observations generated by a dynamical system.
Bio: Clark is a second-year PhD student at Boston University's Faculty of Computing and Data Sciences. He is broadly interested in understanding biological systems and spatiotemporal processes with statistical modeling. Previously he worked at the Mayo Clinic at Jacksonville, FL, and before that earned a Master of Science in Bioengineering from Stanford University and a Bachelor of Science from Boston University in Biomedical Engineering.
Multi-Stain Learning for Robust Neuropathology Evaluation with Lingyi Xu
November 14, 12 PM - CDS 1646
Abstract: Postmortem histopathology remains the gold standard for diagnosing neurodegenerative diseases, yet the diagnostic workflows are often constrained by variable stain availability. We introduce a multimodal deep learning framework for whole slide image analysis that models inter-stain relationships and performs robustly under missing-modality conditions. Through cross-stain representation learning, the framework achieves more reliable and accessible diagnostic performance, offering a step toward data-efficient and resource-flexible digital pathology.
Bio: Lingyi Xu is a Ph.D. student in the Faculty of Computing & Data Sciences at Boston University. She works with Professor Vijaya B. Kolachalama to seek solutions to data missingness in multimodal learning. Her work investigates how different data modalities can be represented and aligned to make learning more adaptable and their relationships more interpretable.
MCP Servers: Why You Need to Know About Them and How They Work with Jeff Hastings
November 7, 12 PM - CDS 1646
Abstract: MCP have revolutionized the way AI connects to resources. Standard API approaches required the developer to customize each connection. With MCP servers, a standardized protocol replaces these fragmented API connections with a single, universal connection method. MCP servers enable the researcher to connect to multiple data sources simultaneously, create reproducible data pipelines, get the most out of agentic AI, and build large libraries that can be quickly and easily queried/summarized.
Bio: Jeff Hastings is a PhD student in the Faculty of Computing & Data Sciences at Boston University, advised by Dr. Joshua Peterson. He earned a BA and MA in Political Science from Utah State University, followed by an MS in Computational Social Science from the University of California, San Diego. His research applies machine learning, deep learning, and reinforcement learning to better understand, explain, and improve human, artificial, and agentic decision-making. Prior to his PhD, he worked as an AI Data Scientist at Thermo Fisher Scientific.
Spherical CNN's and DeepSurv for Psychosis Conversion and The Trick-or-Treat Index with Phillip Angelos
October 31, 12 PM - CDS 1646
Abstract: Psychosis is a serious mental illness that involves altered perceptions of reality and symptoms such as hallucinations and delusions. Many of our questions regarding the disorder are still unanswered and here we'll discuss two projects taking computational approaches into investigating psychosis. First, a generative computational model that allows us to quantify the dysfunction in the brain's top-down expectations and bottom-up sensory signaling. Second, some ongoing work, a Spherical Convolutional Neural Network (SCNN) designed to analyze neuroimaging data across the brain - with an adapted integration of DeepSurv to predict conversion in Clinical High Risk (CHR) patients. The presentation will conclude with a few slides about the "Trick or Treat Index" and the best neighborhoods for trick or treating in Boston!
Bio: Phillip Angelos is a PhD student in the Faculty of Computing and Data Sciences at Boston University, advised by Dr. Joshua Peterson. He earned a Bachelor of Science in Psychology from Michigan State University and spent two years at Yale University researching positive symptom progression in psychosis. His research examines the intersection of artificial intelligence and psychology, with a focus on deep learning, decision-making, impulsivity, and related behavioral patterns.
Policy Modeling for Sex Trafficking Legislation in Massachusetts with Gabe McDonnell-Maayan
October 24, 12 PM - CDS 1646
Abstract: Gabe will present a work-in-progress project that develops a decision-support tool to guide policymaking on sex trafficking legislation in Massachusetts. Sex trafficking, the largest form of modern-day slavery, remains a serious issue across the United States. In Massachusetts, advocacy organizations are actively pushing for competing legislative approaches. In collaboration with a subject-matter expert, Gabe's team constructed a simulation of the commercial sex system and calibrated it to Massachusetts using diverse data sources. Preliminary results from the model are intended to inform upcoming deliberations of the state senate judiciary committee.
Gabe will discuss the problem of sex trafficking and the broader landscape of commercial sex work, with a focus on Massachusetts. This includes an examination of the limited data on sex work in the United States and the methods we use to generate Massachusetts-specific estimates. Gabe will also review potential legislative approaches and the history of advocacy efforts in the state. Next, Gabe will walk through the process of developing a simulation model of commercial sex work and sex trafficking. Finally, Gabe will present preliminary results, highlight their implications for current policy debates, and show how the model can serve as a tool for evaluating future intervention strategies.
Bio: Gabe McDonnell-Maayan is a PhD candidate in Boston University's Faculty of Computing and Data Sciences whose work bridges computational innovation and pressing societal challenges. His research applies tools from complexity science—such as system dynamics modeling, agent-based modeling, and machine learning—to understand and influence the behavior of complex social systems. Gabe’s primary focus is advancing suicide prevention through computational modeling, enabling policymakers and practitioners to test interventions in silico before implementing them in the real world. Beyond suicide prevention, he has contributed to projects addressing sex trafficking, political polarization, pandemic response, and food security. Prior to his doctoral studies, Gabe worked as a software engineer at the MITRE Corporation and earned his Bachelor of Science in Computer Science from Rensselaer Polytechnic Institute.
Data Assimilation under Environmental Disturbance with Jacob Epstein and Projected Changes to Western U.S. Atmospheric Rivers by Vlad Munteanu
October 17, 12 PM - CDS 1646
Abstract:
Jacob: Mathematical models are commonly used to monitor changes in soil and plant carbon pools over time. To ensure that modeled quantities align with observed data, a technique called State Data Assimilation (SDA) can be employed, which updates model states to better match observations. However, during ecological disturbance events such as wildfires, floods, or pest outbreaks, observed carbon pools can change rapidly—causing traditional SDA methods to struggle. This talk will focus on how a modified version of SDA, which combines a discrete multinomial state-and-transition framework with conventional ensemble filtering approaches, can be used to assimilate data during periods of ecological disturbance.
Vlad: In a warming climate, the characteristics of landfalling atmospheric rivers (ARs) over the West Coast of the United States are expected to change. Recent work using a variable-intensity AR-identification method (Hughes et al. 2022) showed that the end-of-21st-century changes in West Coast AR landfall frequency depended on their intensity: extreme ARs increased in both frequency and intensity, whereas moderate ARs decreased in frequency by as much as 10%. Until now, this methodology has been applied only to a small set of regional climate models, however, in this work, we apply this methodology to a large set of global climate models (GCMs). We investigate the shifts of AR frequency as a function of AR intensity in the present and future climate over the United States in a set of 40 of members from the CESM2 Large Ensemble Community Project (LENS2). We find that the changes to ARs in GCMs largely confirm the trends in ARs seen in Hughes et al. 2022 but note limitations to using GCMs to study ARs. The observed shifts in AR frequency have direct implications for Western U.S. precipitation, with increased extreme precipitation and decreased moderate precipitation.
Bio:
Jacob: Jacob Epstein is a 1st-year PhD student at Boston University's Faculty of Computing and Data Sciences. As an undergraduate, he worked on multiple projects applying machine learning techniques to data analysis problems in environmental science, and received a Bachelor of Science in Computer Science and Mathematics at the University of Massachusetts Amherst. His current research interests are in applications of statistical and learning-based methods to environmental science.
Vlad: Vlad Munteanu is a first year PhD student in CDS advised by Prof. Elizabeth Barnes and is interested in applications of data science and machine learning to answer complex questions about climate dynamics and extremes. He has conducted research on the response of atmospheric rivers to warming in climate simulations of the U.S. West Coast at the NOAA Physical Sciences Laboratory. Prior to that, he studied the formation and organization of deep tropical convection at the University of Washington.
How do I get LLMs Up and Running Quickly? with Yan (Stella) Si
October 10, 12 PM - CDS 1646
Abstract: With the rise of large language models (LLMs), new experimental methods are emerging across disciplines — especially in the social sciences. Researchers are increasingly interested in using LLMs as reasoning tools or even as synthetic study participants. But how do you actually work with these tools in practice? In this workshop, we will walk through how to get an LLM up and running, whether through an API or locally on your own machine. We will focus on the most accessible ways that are cheap and high-quality so you can begin experimenting right away.
Bio: Stella is a PhD student at Boston University Computing and Data Sciences, where she works at the intersection of cognitive science and AI.
Her research centers on modeling human decision making, combining neural networks with traditional cognitive models to uncover the psychological principles behind how we choose. She is also building large-scale, high-quality datasets to drive this work forward.
Interdependent Bilateral Trade: Information vs Approximation
October 3, 12 PM - CDS 1646
Abstract: This talk will introduce the area of mechanism design, and then focus on the problem of bilateral trade. Welfare maximization in bilateral trade has been extensively studied in recent years, primarily for the private values case. This talks will focus on welfare maximization in bilateral trade with interdependent values. Designing mechanisms for interdependent settings is much more challenging because the values of the players depend on the private information of others, requiring complex belief updates and strategic inference. Based on Interdependent Bilateral Trade: Information vs Approximation (EC25).
Spring 2025
Attention-Based Deep Learning for Analysis of Pathology Images and Gene Expression Data in Lung Squamous Premalignant Lesions with Lingyi Xu
April 18, 12 PM - CDS 1646
Abstract: Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions through a histologic progression from normal to hyperplasia, metaplasia, dysplasia, carcinoma in situ and invasive carcinoma. Endobronchial biopsies obtained via various bronchoscopy techniques are formalin fixed paraffin embedded, and hematoxylin and eosin stained (H&E) to access the pathologic features and histologic grade of the tissue. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging.
Here we proposed a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images of endobronchial biopsies and bulk gene expression data from previously published studies as well as new data obtained from high-risk patients. Our framework maximizes the use of training data by allowing sample inputs with one or both data modalities. The flexibility of our framework to make predictions when a data modality is missing and its ability to integrate data from different modalities and studies is important for advancing our stratification of bronchial premalignant lesions.
Bio: Lingyi is a PhD student in Computing & Data Sciences at Boston University. Her research investigates the potential of multimodal medical data in enhancing disease diagnosis and assessment. Her current work involves applying graph models and machine learning algorithms in digital pathology and cancer genomics to advance the understanding of lung precancerous conditions.
Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models with Zhengyang Shan
April 4, 12 PM - CDS 1646
Abstract: We introduce a comprehensive framework for assessing gender fairness in large language models (LLMs), particularly in their treatment of both binary and non-binary genders. Existing research has largely focused on binary gender distinctions, neglecting the inclusivity of non-binary identities. To address this, the authors propose a novel metric that evaluates LLMs across seven dimensions. The study conducts extensive evaluations on 15 popular LLMs, revealing significant discrepancies in their ability to fairly represent diverse gender identities.
Bio: Zhengyang is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Her research interests lie in the evaluation, interpretability, and fairness of Large Language Models (LLMs).
Computing and Collective Action with Freddy Reiber
March 21, 12 PM - CDS 1646
Abstract: Labor unions are a critical component of ensuring dignified working conditions for laborers. However, as a byproduct of neoliberalization American labor unions have been in a free-fall in terms of membership numbers. Drawing from sociological work on organized labor, we seek to analyze how labor organizers try to bring workers together, to act collectively, through digital communication technologies. Towards that end, we interviewed ~19 labor union members, who engaged in digital worker-to-worker organizing through tools like Slack or Discord focusing on how workers utilized and interacted with each other on these digital platforms. In this talk we provide preliminary results on this study along with some early discussions around developing useful technical tools for supporting worker-to-worker organizing.
Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.
Audited Auctions: Addressing Externalities in One-sided Mechanisms with Tejovan Parker & Gabe Maayan
February 21, 12 PM - CDS 1646
Abstract: We consider the form of externalities where some agent(s) have preferences over the outcome of a mechanism, and their preferences cannot be known before the mechanism is run. Often, this problem is solved by an auctioneer inserting dummy bids to represent the externalities. However, there may be ethical, trust, or power issues with delegating the determination of one's values to a central entity. And, it is extremely unreasonable to have all agents constantly estimate and report their values for the actions of all other agents in a system. Even if it were acceptable for a central entity to estimate externalities, it is more efficient to only audit what is more likely to be harmful, rather than auditing everything.
To address this, we consider auctions where the auctioneer has the power to (randomly) audit bidders to learn their externality, and impose penalties accordingly. In this setting, the power to audit results in equivalent bidder behavior as letting the auctioneer set individualized entry fees for bidders as a function of their non-manipulable externality type, and this results in thresholds of participation as functions of externality.
This setting is motivated by a variety of practical scenarios. For example, an auctioneer might run a social or traditional media platform where bidders compete to post news or ads on user feeds. In this setting, end users can experience bidders' posts as nuisance costs, incurring negative externalities.
Our objective is to maximize total welfare, i.e. the sum of individual value and externalities. In this paper, we show how penalty functions induce thresholds of participation, and prove analytically that welfare optimal participation thresholds in the i.i.d. setting with no competition are linear. Additionally, in the setting with competition for a single item and where i.i.d. bidders may only take two discrete types, the optimal threshold is linear and behaves analogously to Myersonian revenue-maximizing reserve-prices.
To illustrate results in more complicated settings, we use simulation with computational optimization to characterize welfare increases over participation threshold functions. We collect a dataset from X (formerly Twitter) to create an empirical joint-distribution of sender and receiver value, and simulate auctions from this empirical data. We find that optimal thresholds shift welfare from producers to users and increase overall welfare in all settings. We also observe that optimal thresholds are linear even with the empirical type distributions. However, the penalty functions will not be linear in general, which makes an interesting comparison to linear contracts. Our results suggest that auditing and penalizing externalities in real-world sponsored-search and advertising auctions have the potential to create substantial increases in social welfare.
Bio: Tejovan Parker is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he studied Mechanical and Global Engineering at the University of Colorado Boulder. He is interested in better management of social, political, and economic systems through mathematical and algorithmic methods. Tejovan began his PhD studies at BU in Fall 2022. In his first two years at CDS, he is building his expertise and looking to assist in existing research within misinformation markets.
Gabe is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.
The Suicidal Mind with Gabe McDonnell-Maayan
January 31, 12 PM - CDS 1646
Abstract: I present the "Suicidal Mind", a theory synthesis modeling exercise aiming to simulate a distressed mind. I will briefly cover the topics of the first talk: the challenges of, and different approaches to suicide research, relevant theories of suicide, and an overview of our model. I will then focus on various forms of model validation, including parameter sweeps, scenario recreation, GAM surface fitting, and multivariate time-series clustering.
Bio: Gabe is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.
Fall 2024
Labor Unions and Digital Democracy with Freddy Reiber
November 22, 11 AM - CDS 1646
Abstract: Labor Unions have served an important role in giving workers a voice within the economy, however, this does not mean they are without critique. Central to many union critiques is the lack of meaningful democracy within unions, or what Robert Michels calls the “Iron law of oligarchy”. In the 2000s researchers thought that information and communication technology might serve as a solution to these problems, however as empirical literature developed, it became clear that ICTs were not the silver bullet theorists had originally hoped. This talk reviews literature on both the theorizing and empirical work of labor scholars and HCI researchers as to why ICTs didn’t provide a meaningful shift in union democracy as well as some ideas for future work.
Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.