Computing and Data Science PhD Student Seminar Series

The Boston University PhD program is home to a wide range of students, all studying various facets of data science. To help give students a friendly opportunity to practice and develop their research skills, we are launching the Computing and Data Science PhD Student Seminar Series. This series is focused on allowing doctoral students to present their research within a supportive and collaborative environment. Each seminar offers students a chance to share their findings, practice presentation skills, and receive constructive feedback from peers and faculty in a friendly, non-judgmental setting. This format not only helps students refine their work but also fosters essential communication skills that are crucial for their academic and professional careers.

Beyond the academic benefits, the seminar series is a community-building endeavor that seeks to strengthen connections among CDS students. By creating a space for students to share their work with the public, students from various backgrounds can learn from each other's experiences and methodologies.

The seminar series, organized by students Freddy Reiber, Lingyi Xu, and Yan (Stella) Si, meets weekly throughout the year on Fridays from noon to 1 PM, with lunch during the talk. Students interested in giving a talk should reach out to the organizers through email.

You can also view more details at the link here.

CDS PhD Student Lightning Talk Competition

April 24, 2026, 12-1 PM - CDS 1646

Abstract: Come listen to the cutting-edge ideas CDS PhD students are working on, and vote for your favorite talks! PhD students will each deliver a 2-minute lightning talk (plus 1 minute of Q&A) on their research interests, in-progress work, or new project ideas. You, the audience, anonymously pick the winners across three categories: Most Interesting Subject Matter, Most Entertaining Speaker, and Most Out There / Avant-Garde. The event is open to all students, faculty, and staff.

Stop the Nonconsensual Use of Nude Images in Research (Published at NeurIPS 2025 - Oral)

May 1, 2026, 12-1 PM - CDS 1635

Abstract: In order to train, test, and evaluate nudity detection models, machine learning researchers typically rely on nude images scraped from the Internet. Our research finds that this content is collected and, in some cases, subsequently distributed by researchers without consent, leading to potential misuse and exacerbating harm against the subjects depicted. We argue that the distribution of nonconsensually collected nude images by researchers perpetuates image-based sexual abuse and that the machine learning community should stop the nonconsensual use of nude images in research. To characterize the scope and nature of this problem, we conducted a systematic review of papers published in computing venues that collect and use nude images. Our results paint a grim reality: norms around the usage of nude images are sparse, leading to a litany of problematic practices like distributing and publishing nude images with uncensored faces, and intentionally collecting and sharing abusive content. We conclude with a call-to-action for publishing venues and a vision for research in nudity detection that balances user agency with concrete research objectives. You can check out the paper here: openreview.net/pdf?id=Ev5xwr3vWh

Bio: Princessa Cintaqia is a PhD student at Boston University's Faculty of Computing and Data Sciences working with Allison McDonald. Previously, she earned her bachelor's from the University of Indonesia in her beautiful home country of Indonesia. She is interested in socially aware computer security, especially in the context of sexual privacy and human-centered cryptography.

Past Talks

Spring 2026

Calibrated Information Extraction from Coastal Ecosystems Literature

April 17, 2026, 12-1 PM - CDS 1646

Abstract: A large portion of data for freshwater and coastal ecosystems exists within text, tables, and figures from PDF research papers. Generative AI is increasingly used as a tool for extracting such data, but is subject to high risk inaccuracies (e.g. 'hallucinations'). We propose to surmount this drawback through a novel technique: calibrated information extraction. We develop mechanistic interpretability tools for probing an LLM's internal activation patterns and producing confidence scores for extracted data points. In turn, we show that strong calibration among scores suggests a path for reliably supporting ecological research in downstream statistical models or analyses.

Bio: Kevin is a PhD student at Boston University working with Professor Mark Crovella and Professor Evimaria Terzi. Kevin's research focuses on the design and application of interpretable machine learning models, specifically for unsupervised clustering problems. He previously completed a BA in mathematics and computer science at BU.

Union Busting and Workplace Resistance & What is Alt-Tech? with Freddy Reiber and Tyler Calabrese

April 3, 2026, 12-1 PM - CDS 1646

Abstract:

Freddy: Union Busting and Workplace Resistance: Freddy will be talking about the role of technologies in union busting and future or workplace resistance.

Tyler: What is Alt-Tech?: Tyler will be presenting on a literature review on the alt-right/alt-tech media ecosystem.

Bio:

Freddy: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.

Tyler: Tyler Calabrese is a PhD student at Boston University's Faculty of Computing and Data Sciences, working with Allison McDonald. Previously, he worked as a Software Developer at Strike Technologies and earned his Bachelor's in Computer Science and English from Tufts University. His research interests include usable privacy and security, particularly in the context of police surveillance.

Evaluating Language Model Responses to Mental Health Symptom Disclosures & Survey of Predictive Recursive Algorithms for Inference with Micah Benson and Clark Ikezu

April 10, 2026, 12-1 PM - CDS 1646

Abstract:

Micah: Evaluating Language Model Responses to Mental Health Symptom Disclosures: We use depression and anxiety questionnaires to build an evaluation dataset that simulates mental health symptom disclosures by language model users. We analyze patterns in language model responses and explore how common jailbreaks change these behaviors.

Clark: Survey of Predictive Recursive Algorithms for Inference: There has been growing interest in Bayesian predictive inference. This talk will survey predictive recursive algorithms and other related stochastic approximation algorithms for inferring quantities of interest given noisy, (possibly partially) exchangeable observations from some unknown, underlying system.

Bio:

Micah: Micah studies the societal impacts of large language models (LLMs) as a PhD Student at Boston University's Faculty of Computing & Data Sciences. He uses interpretability methods to investigate how LLMs represent social concepts such as identity and politics, with the goal of developing techniques to improve model fairness. He also conducts audits that simulate new uses of LLMs to analyze potential benefits and risks of the technology. Before BU, Micah graduated from WashU with a double major in data science and English.

Clark: Clark is a second-year PhD student at Boston University's Faculty of Computing and Data Sciences. He is broadly interested in understanding biological systems and spatiotemporal processes with statistical modeling. Previously he worked at the Mayo Clinic at Jacksonville, FL, and before that earned a Master of Science in Bioengineering from Stanford University and a Bachelor of Science from Boston University in Biomedical Engineering.

Western Pacific tropical cyclones over the past 500 years: when a deep-learning climate emulator meets a Chinese handwritten historical record

March 27, 2026, 12-1 PM - CDS 1646

Abstract: Digitized handwritten Chinese historical records REACHES show that tropical cyclone (TC) landfall frequency peaked in 1650-1680 AD over the past 500 years. However, the environmental conditions that lead to this peak remain unknown. This study uses a novel deep-learning climate emulator, ACE2, and a dynamical model, HiRAM, both forced with the last-millennium reconstructed sea surface temperatures and sea ice to uncover the large-scale climate states that drive the long-term variability in Western Pacific TC frequency and track. We find that simulated TC landfall frequency in East Asia also peaks in ACE2 during the 1650-1680 AD period, consistent with REACHES data. Furthermore, the seasonal cycle of Western Pacific TC activity has two peaks during this period, different from a single peak in the current climate, possibly associated with the shift from the East Asian monsoon to the South Asian monsoon. We investigate the large-scale circulation and environmental conditions that drive changes in TC genesis, track, and seasonal cycle over the past 500 years. Our lessons learned have implications for future changes in TC activities in the Western Pacific. Meanwhile, our work proposes a framework to investigate paleoclimate TCs by combining an AI global climate emulator with proxy data.

Bio: Mu-Ting Chien is a postdoc in Libby Barnes's group. Her research focuses on tropical cyclones and climate change using machine learning and global climate simulations. Before coming to BU, she was a postdoc at Colorado State University. She received her PhD in Atmospheric Science from the University of Washington in 2024.

Public Goods Games with Nonlinearities

March 20, 2026, 12-1 PM - CDS 1635

Abstract: Public goods games are a model of many-player social dilemmas; we study these games from the perspective of evolutionary game theory, and particularly the evolution of cooperation and altruism. We introduce non-linearities to the benefit of the public good, finding that non-linearities have impacts on the relationship between resource inequality and evolutionary dynamics.

Bio: Gavin Rees is a PhD student in Boston University's Faculty of Computing & Data Sciences whose work combines mathematics and evolutionary biology. His research focuses on social behavior and combines approaches from theoretical biology, statistics, and evolutionary game theory to understand ecological and evolutionary dynamics of intertwined systems. His primary focus is on biological complexity, and he has worked in evolution of cooperation in many-player social dilemmas, as well as inferring social dynamics in political bodies. Prior to his doctoral studies, Gavin earned his Bachelor's in Mathematics from Harvard University with a secondary in Computer Science, and worked as a software engineer at Markforged, and as research assistant at the Institute of Science and Technology Austria and the Complexity Science Hub, Vienna.

Vision-Language Modeling for Neuropathological Evaluation

March 6, 2026, 12-1 PM - CDS 1635

Abstract: Recent development in vision-language models has enabled flexible multimodal understanding and instruction-following. In this work, we introduce a vision-language framework for neuropathology that emphasizes diagnostic accuracy through visual QA. Without dense spatial supervision, this framework achieves accurate and reliable diagnostic decision making for a wide array of comorbid neuropathologies, offering a disease-agnostic approach for neuropathological evaluation.

Bio: Lingyi Xu is a Ph.D. student in the Faculty of Computing & Data Sciences at Boston University. She works with Professor Vijaya B. Kolachalama to seek solutions to data missingness in multimodal learning. Her work investigates how different data modalities can be represented and aligned to make learning more adaptable and their relationships more interpretable.

Guaranteed Speech

February 27, 2026, 12-1 PM - CDS 1635

Abstract: I will discuss a strategic scenario inspired by the misinformation problem. A decision maker receives advice from an informed source. But, the source's incentives may be misaligned with the decision maker. When the source has the option to guarantee their message, this may improve the situation for the decision maker. Technically, the model is a combination of cheap talk and costly signaling, analyzed for strategic equilibrium with commitment power (stackelberg).

Bio: Tejovan is interested in using computation and the complex system perspective to understand how to better manage multi-agent, or socio-economic, systems. He is working with Marshall Van Alystyne, Xuezhou Zhang, and Francisco Marmolejo-Cossío to investigate simple economic mechanisms to improve equilibrium behaviors in partially-observed stochastic games with imperfect incentive alignments. Tejovan began his PhD studies at BU in September 2022. Prior to this he studied Mechanical and Global Engineering at the University of Colorado Boulder.

Propagating Surrogate Uncertainty in Bayesian Inverse Problems

February 20, 2026, 12-1 PM - CDS 1635

Abstract: Standard Bayesian inference schemes are infeasible for inverse problems with computationally expensive forward models. A common solution is to replace the model with a cheaper surrogate. To avoid overconfident conclusions, it is essential to acknowledge the surrogate approximation by propagating its uncertainty. At present, a variety of distinct uncertainty propagation methods have been suggested, with little understanding of how they vary. To fill this gap, we propose a mixture distribution termed the expected posterior (EP) as a general baseline for uncertainty-aware posterior approximation, justified by decision theoretic and modular Bayesian inference arguments. We compare this distribution to popular alternatives, present an approximate Markov chain Monte Carlo sampler for EP-based inference, and highlight future directions.

Bio: Andrew Roberts is a PhD student in Computing and Data Sciences at Boston University, working with Professor Jonathan Huggins and Professor Michael Dietze. He is broadly interested in scientific machine learning, Bayesian modeling, and uncertainty quantification, with the goal of developing new methodologies for environmental and ecological applications. Andrew's current work focuses on developing statistical and computational methods to better utilize process-based models of the terrestrial carbon cycle. Learn more about Andrew here: arob5.github.io

Quantitative evaluation frameworks for the trustworthiness of large language model outputs in medical domains

February 13, 2026, 12-1 PM - CDS 1646

Abstract: Although large language model (LLM)–based tools have become increasingly popular, their deployment in real-world clinical settings demands a much higher level of precision and reliability, where the cost of diagnostic errors is substantial. Currently, clinicians remain skeptical about relying on LLMs for clinical decision-making, largely due to the lack of rigorous evidence supporting individual model outputs and limited understanding of how such outputs are generated. Even when an LLM produces a correct answer, clinicians often find it difficult to trust the result without transparent justification. Addressing this trust gap is therefore an urgent need. In Yi’s first project, she proposes a scalable, entity-centric evaluation framework for medical question answering, which assesses the clinical alignment and informativeness of LLM-generated responses by tracing and verifying clinically relevant medical entities within patient-specific contexts. This framework enables more faithful and interpretable evaluation of medical LLM outputs beyond surface-level correctness. Building on this work, Yi’s ongoing research explores interpretability methods to analyze the decision flow of LLMs, examining how patient information is processed through internal model representations and transformed into diagnostic summaries or clinical decisions. Together, these efforts aim to improve the transparency and trustworthiness of LLMs for clinical applications.

Bio: Advised by Professor Vijiaya Kolachalama, with a general interest in free-form text evaluation and methods for assessing open-ended model output and making it more reliable. Yi's work focuses on large language models in clinical settings, particularly in medical question answering and diagnostic reasoning for Alzheimer’s disease. She is especially interested in evaluation frameworks and interpretability methods that help reveal how medical evidence is represented, transformed, and utilized inside LLMs, as well as approaches for detecting reasoning errors and improving the accuracy of model-generated clinical summaries.

Bayesian Online Model Selection with Yuke Zhang and Aida Afshar

February 6, 2026, 12-1 PM - CDS 1646

Abstract: Online model selection in Bayesian bandits poses a fundamental challenge of exploration: For an unknown environment instance drawn from the prior distribution, how can people adaptively explore multiple bandit learners, and compete with the best one in terms of performance? Yuke and Aida address this problem by introducing a novel Bayesian algorithm for online model selection in stochastic bandits. They establish an oracle-best guarantee of Õ(d⋆ M√T + √(MT)) on the Bayesian regret, where M is the number of base learners, d⋆ is the regret coefficient of the optimal base learner, and T is the time horizon. They further validate their algorithm through experiments across various stochastic bandit settings, demonstrating its performance is competitive with that of the best base learner.

Bio:

Yuke: Advised by Professor Aguêmon Yves Atchadé, Yuke is generally interested in statistics foundations of deep learning under the Bayesian perspective. Specifically, Yuke is focusing on online learning on high-dimensional sparse spatio-temporal data, with emphasis on algorithms like Thompson Sampling.

Aida: Aida Afshar is a PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, she received her bachelor's degree in Mathematics with a minor in Computer Science from Sharif University of Technology. Her primary research interests are Sequential Decision-Making and Statistical Foundations of Continual Learning. Learn more about Aida here: aidaafshar.github.io

Fall 2025

PhD Seminar Series Holiday Party

December 12, 12 PM - CDS 1635

Abstract: End of the Fall semester - Hooray! Join us for a holiday celebration to wrap up another successful semester of student research presentations.

WinoReferral: A Benchmark to Evaluate LLM Responses to Prompts Implying Mental Health Disorders with Micah Benson

December 5, 12 PM - CDS 1646

Abstract: The use of LLMs has contributed to serious mental health conditions among a subset of users, most alarmingly cases of delusional thinking and suicide. My talk will begin by surveying leading theories from psychology and human computer interaction that explain how LLM use interacts with user mental health. I will then present my research on evaluating LLM responses to prompts that imply a user has depression and discuss our current results, which indicate high variance in AI companies’ safety policies for referring users to mental health professionals. I will conclude by proposing policy changes that could limit the harms of LLMs on user mental health.

Bio: Micah studies the societal impacts of large language models (LLMs) as a PhD Student at Boston University's Faculty of Computing & Data Sciences. He uses interpretability methods to investigate how LLMs represent social concepts such as identity and politics, with the goal of developing techniques to improve model fairness. He also conducts audits that simulate new uses of LLMs to analyze potential benefits and risks of the technology. Before BU, Micah graduated from WashU with a double major in data science and English.

Bayesian Predictive Modeling: Towards Martingale Posterior Distributions for Dynamical Systems with Clark Ikezu

November 21, 12 PM - CDS 1646

Abstract: Bayesian inference is a principled way to quantify uncertainty over parameters. The predominant approach involves specifying a prior and a likelihood in order to compute a posterior distribution. Prediction is then achieved through computing the posterior predictive distribution. However, prediction can also be viewed as the primary task of Bayesian inference, in which specifying a predictive model comes first, and inferring the posterior distribution follows next. This approach is appealing in several ways, including that one reasons over quantities we can observe, as opposed to parameters that cannot be observed. In this talk I will introduce this Bayesian "predictive approach", and discuss a particular method called the martingale posterior distribution implemented by the predictive resampling algorithm. Next I will present preliminary work in which I show how the predictive resampling algorithm can be useful for posterior inference in the setting of non-i.i.d observations generated by a dynamical system.

Bio: Clark is a second-year PhD student at Boston University's Faculty of Computing and Data Sciences. He is broadly interested in understanding biological systems and spatiotemporal processes with statistical modeling. Previously he worked at the Mayo Clinic at Jacksonville, FL, and before that earned a Master of Science in Bioengineering from Stanford University and a Bachelor of Science from Boston University in Biomedical Engineering.

Multi-Stain Learning for Robust Neuropathology Evaluation with Lingyi Xu

November 14, 12 PM - CDS 1646

Abstract: Postmortem histopathology remains the gold standard for diagnosing neurodegenerative diseases, yet the diagnostic workflows are often constrained by variable stain availability. We introduce a multimodal deep learning framework for whole slide image analysis that models inter-stain relationships and performs robustly under missing-modality conditions. Through cross-stain representation learning, the framework achieves more reliable and accessible diagnostic performance, offering a step toward data-efficient and resource-flexible digital pathology.

MCP Servers: Why You Need to Know About Them and How They Work with Jeff Hastings

November 7, 12 PM - CDS 1646

Abstract: MCP have revolutionized the way AI connects to resources. Standard API approaches required the developer to customize each connection. With MCP servers, a standardized protocol replaces these fragmented API connections with a single, universal connection method. MCP servers enable the researcher to connect to multiple data sources simultaneously, create reproducible data pipelines, get the most out of agentic AI, and build large libraries that can be quickly and easily queried/summarized.

Bio: Jeff Hastings is a PhD student in the Faculty of Computing & Data Sciences at Boston University, advised by Dr. Joshua Peterson. He earned a BA and MA in Political Science from Utah State University, followed by an MS in Computational Social Science from the University of California, San Diego. His research applies machine learning, deep learning, and reinforcement learning to better understand, explain, and improve human, artificial, and agentic decision-making. Prior to his PhD, he worked as an AI Data Scientist at Thermo Fisher Scientific.

Spherical CNN's and DeepSurv for Psychosis Conversion and The Trick-or-Treat Index with Phillip Angelos

October 31, 12 PM - CDS 1646

Abstract: Psychosis is a serious mental illness that involves altered perceptions of reality and symptoms such as hallucinations and delusions. Many of our questions regarding the disorder are still unanswered and here we'll discuss two projects taking computational approaches into investigating psychosis. First, a generative computational model that allows us to quantify the dysfunction in the brain's top-down expectations and bottom-up sensory signaling. Second, some ongoing work, a Spherical Convolutional Neural Network (SCNN) designed to analyze neuroimaging data across the brain - with an adapted integration of DeepSurv to predict conversion in Clinical High Risk (CHR) patients. The presentation will conclude with a few slides about the "Trick or Treat Index" and the best neighborhoods for trick or treating in Boston!

Bio: Phillip Angelos is a PhD student in the Faculty of Computing and Data Sciences at Boston University, advised by Dr. Joshua Peterson. He earned a Bachelor of Science in Psychology from Michigan State University and spent two years at Yale University researching positive symptom progression in psychosis. His research examines the intersection of artificial intelligence and psychology, with a focus on deep learning, decision-making, impulsivity, and related behavioral patterns.

Policy Modeling for Sex Trafficking Legislation in Massachusetts with Gabe McDonnell-Maayan

October 24, 12 PM - CDS 1646

Abstract: Gabe will present a work-in-progress project that develops a decision-support tool to guide policymaking on sex trafficking legislation in Massachusetts. Sex trafficking, the largest form of modern-day slavery, remains a serious issue across the United States. In Massachusetts, advocacy organizations are actively pushing for competing legislative approaches. In collaboration with a subject-matter expert, Gabe's team constructed a simulation of the commercial sex system and calibrated it to Massachusetts using diverse data sources. Preliminary results from the model are intended to inform upcoming deliberations of the state senate judiciary committee.

Gabe will discuss the problem of sex trafficking and the broader landscape of commercial sex work, with a focus on Massachusetts. This includes an examination of the limited data on sex work in the United States and the methods we use to generate Massachusetts-specific estimates. Gabe will also review potential legislative approaches and the history of advocacy efforts in the state. Next, Gabe will walk through the process of developing a simulation model of commercial sex work and sex trafficking. Finally, Gabe will present preliminary results, highlight their implications for current policy debates, and show how the model can serve as a tool for evaluating future intervention strategies.

Bio: Gabe McDonnell-Maayan is a PhD candidate in Boston University's Faculty of Computing and Data Sciences whose work bridges computational innovation and pressing societal challenges. His research applies tools from complexity science—such as system dynamics modeling, agent-based modeling, and machine learning—to understand and influence the behavior of complex social systems. Gabe’s primary focus is advancing suicide prevention through computational modeling, enabling policymakers and practitioners to test interventions in silico before implementing them in the real world. Beyond suicide prevention, he has contributed to projects addressing sex trafficking, political polarization, pandemic response, and food security. Prior to his doctoral studies, Gabe worked as a software engineer at the MITRE Corporation and earned his Bachelor of Science in Computer Science from Rensselaer Polytechnic Institute.

Data Assimilation under Environmental Disturbance with Jacob Epstein and Projected Changes to Western U.S. Atmospheric Rivers by Vlad Munteanu

October 17, 12 PM - CDS 1646

Abstract:

Jacob: Mathematical models are commonly used to monitor changes in soil and plant carbon pools over time. To ensure that modeled quantities align with observed data, a technique called State Data Assimilation (SDA) can be employed, which updates model states to better match observations. However, during ecological disturbance events such as wildfires, floods, or pest outbreaks, observed carbon pools can change rapidly—causing traditional SDA methods to struggle. This talk will focus on how a modified version of SDA, which combines a discrete multinomial state-and-transition framework with conventional ensemble filtering approaches, can be used to assimilate data during periods of ecological disturbance.

Vlad: In a warming climate, the characteristics of landfalling atmospheric rivers (ARs) over the West Coast of the United States are expected to change. Recent work using a variable-intensity AR-identification method (Hughes et al. 2022) showed that the end-of-21st-century changes in West Coast AR landfall frequency depended on their intensity: extreme ARs increased in both frequency and intensity, whereas moderate ARs decreased in frequency by as much as 10%. Until now, this methodology has been applied only to a small set of regional climate models, however, in this work, we apply this methodology to a large set of global climate models (GCMs). We investigate the shifts of AR frequency as a function of AR intensity in the present and future climate over the United States in a set of 40 of members from the CESM2 Large Ensemble Community Project (LENS2). We find that the changes to ARs in GCMs largely confirm the trends in ARs seen in Hughes et al. 2022 but note limitations to using GCMs to study ARs. The observed shifts in AR frequency have direct implications for Western U.S. precipitation, with increased extreme precipitation and decreased moderate precipitation.

Bio:
Jacob: Jacob Epstein is a 1st-year PhD student at Boston University's Faculty of Computing and Data Sciences. As an undergraduate, he worked on multiple projects applying machine learning techniques to data analysis problems in environmental science, and received a Bachelor of Science in Computer Science and Mathematics at the University of Massachusetts Amherst. His current research interests are in applications of statistical and learning-based methods to environmental science.

Vlad: Vlad Munteanu is a first year PhD student in CDS advised by Prof. Elizabeth Barnes and is interested in applications of data science and machine learning to answer complex questions about climate dynamics and extremes. He has conducted research on the response of atmospheric rivers to warming in climate simulations of the U.S. West Coast at the NOAA Physical Sciences Laboratory. Prior to that, he studied the formation and organization of deep tropical convection at the University of Washington.

How do I get LLMs Up and Running Quickly? with Yan (Stella) Si

October 10, 12 PM - CDS 1646

Abstract: With the rise of large language models (LLMs), new experimental methods are emerging across disciplines — especially in the social sciences. Researchers are increasingly interested in using LLMs as reasoning tools or even as synthetic study participants. But how do you actually work with these tools in practice? In this workshop, we will walk through how to get an LLM up and running, whether through an API or locally on your own machine. We will focus on the most accessible ways that are cheap and high-quality so you can begin experimenting right away.

Bio: Stella is a PhD student at Boston University Computing and Data Sciences, where she works at the intersection of cognitive science and AI.

Her research centers on modeling human decision making, combining neural networks with traditional cognitive models to uncover the psychological principles behind how we choose. She is also building large-scale, high-quality datasets to drive this work forward.

Interdependent Bilateral Trade: Information vs Approximation

October 3, 12 PM - CDS 1646

Abstract: This talk will introduce the area of mechanism design, and then focus on the problem of bilateral trade. Welfare maximization in bilateral trade has been extensively studied in recent years, primarily for the private values case. This talks will focus on welfare maximization in bilateral trade with interdependent values. Designing mechanisms for interdependent settings is much more challenging because the values of the players depend on the private information of others, requiring complex belief updates and strategic inference. Based on Interdependent Bilateral Trade: Information vs Approximation (EC25).

Spring 2025

Attention-Based Deep Learning for Analysis of Pathology Images and Gene Expression Data in Lung Squamous Premalignant Lesions with Lingyi Xu

April 18, 12 PM - CDS 1646

Abstract: Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions through a histologic progression from normal to hyperplasia, metaplasia, dysplasia, carcinoma in situ and invasive carcinoma. Endobronchial biopsies obtained via various bronchoscopy techniques are formalin fixed paraffin embedded, and hematoxylin and eosin stained (H&E) to access the pathologic features and histologic grade of the tissue. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging.

Here we proposed a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images of endobronchial biopsies and bulk gene expression data from previously published studies as well as new data obtained from high-risk patients. Our framework maximizes the use of training data by allowing sample inputs with one or both data modalities. The flexibility of our framework to make predictions when a data modality is missing and its ability to integrate data from different modalities and studies is important for advancing our stratification of bronchial premalignant lesions.

Bio: Lingyi is a PhD student in Computing & Data Sciences at Boston University. Her research investigates the potential of multimodal medical data in enhancing disease diagnosis and assessment. Her current work involves applying graph models and machine learning algorithms in digital pathology and cancer genomics to advance the understanding of lung precancerous conditions.

Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models with Zhengyang Shan

April 4, 12 PM - CDS 1646

Abstract: We introduce a comprehensive framework for assessing gender fairness in large language models (LLMs), particularly in their treatment of both binary and non-binary genders. Existing research has largely focused on binary gender distinctions, neglecting the inclusivity of non-binary identities. To address this, the authors propose a novel metric that evaluates LLMs across seven dimensions. The study conducts extensive evaluations on 15 popular LLMs, revealing significant discrepancies in their ability to fairly represent diverse gender identities.

Bio: Zhengyang is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Her research interests lie in the evaluation, interpretability, and fairness of Large Language Models (LLMs).

Computing and Collective Action with Freddy Reiber

March 21, 12 PM - CDS 1646

Abstract: Labor unions are a critical component of ensuring dignified working conditions for laborers. However, as a byproduct of neoliberalization American labor unions have been in a free-fall in terms of membership numbers. Drawing from sociological work on organized labor, we seek to analyze how labor organizers try to bring workers together, to act collectively, through digital communication technologies. Towards that end, we interviewed ~19 labor union members, who engaged in digital worker-to-worker organizing through tools like Slack or Discord focusing on how workers utilized and interacted with each other on these digital platforms. In this talk we provide preliminary results on this study along with some early discussions around developing useful technical tools for supporting worker-to-worker organizing.

Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.

Audited Auctions: Addressing Externalities in One-sided Mechanisms with Tejovan Parker & Gabe Maayan

February 21, 12 PM - CDS 1646

Abstract: We consider the form of externalities where some agent(s) have preferences over the outcome of a mechanism, and their preferences cannot be known before the mechanism is run. Often, this problem is solved by an auctioneer inserting dummy bids to represent the externalities. However, there may be ethical, trust, or power issues with delegating the determination of one's values to a central entity. And, it is extremely unreasonable to have all agents constantly estimate and report their values for the actions of all other agents in a system. Even if it were acceptable for a central entity to estimate externalities, it is more efficient to only audit what is more likely to be harmful, rather than auditing everything.

To address this, we consider auctions where the auctioneer has the power to (randomly) audit bidders to learn their externality, and impose penalties accordingly. In this setting, the power to audit results in equivalent bidder behavior as letting the auctioneer set individualized entry fees for bidders as a function of their non-manipulable externality type, and this results in thresholds of participation as functions of externality.

This setting is motivated by a variety of practical scenarios. For example, an auctioneer might run a social or traditional media platform where bidders compete to post news or ads on user feeds. In this setting, end users can experience bidders' posts as nuisance costs, incurring negative externalities.

Our objective is to maximize total welfare, i.e. the sum of individual value and externalities. In this paper, we show how penalty functions induce thresholds of participation, and prove analytically that welfare optimal participation thresholds in the i.i.d. setting with no competition are linear. Additionally, in the setting with competition for a single item and where i.i.d. bidders may only take two discrete types, the optimal threshold is linear and behaves analogously to Myersonian revenue-maximizing reserve-prices.

To illustrate results in more complicated settings, we use simulation with computational optimization to characterize welfare increases over participation threshold functions. We collect a dataset from X (formerly Twitter) to create an empirical joint-distribution of sender and receiver value, and simulate auctions from this empirical data. We find that optimal thresholds shift welfare from producers to users and increase overall welfare in all settings. We also observe that optimal thresholds are linear even with the empirical type distributions. However, the penalty functions will not be linear in general, which makes an interesting comparison to linear contracts. Our results suggest that auditing and penalizing externalities in real-world sponsored-search and advertising auctions have the potential to create substantial increases in social welfare.

Bio: Tejovan Parker is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he studied Mechanical and Global Engineering at the University of Colorado Boulder. He is interested in better management of social, political, and economic systems through mathematical and algorithmic methods. Tejovan began his PhD studies at BU in Fall 2022. In his first two years at CDS, he is building his expertise and looking to assist in existing research within misinformation markets.

Gabe is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.

The Suicidal Mind with Gabe McDonnell-Maayan

January 31, 12 PM - CDS 1646

Abstract: I present the "Suicidal Mind", a theory synthesis modeling exercise aiming to simulate a distressed mind. I will briefly cover the topics of the first talk: the challenges of, and different approaches to suicide research, relevant theories of suicide, and an overview of our model. I will then focus on various forms of model validation, including parameter sweeps, scenario recreation, GAM surface fitting, and multivariate time-series clustering.

Bio: Gabe is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.

Computing and Data Science PhD Student Seminar Series

CDS PhD Student Lightning Talk Competition

April 24, 2026, 12-1 PM - CDS 1646

Stop the Nonconsensual Use of Nude Images in Research (Published at NeurIPS 2025 - Oral)

May 1, 2026, 12-1 PM - CDS 1635

Past Talks

Spring 2026

Calibrated Information Extraction from Coastal Ecosystems Literature

April 17, 2026, 12-1 PM - CDS 1646

Union Busting and Workplace Resistance & What is Alt-Tech? with Freddy Reiber and Tyler Calabrese

April 3, 2026, 12-1 PM - CDS 1646

Evaluating Language Model Responses to Mental Health Symptom Disclosures & Survey of Predictive Recursive Algorithms for Inference with Micah Benson and Clark Ikezu

April 10, 2026, 12-1 PM - CDS 1646

Western Pacific tropical cyclones over the past 500 years: when a deep-learning climate emulator meets a Chinese handwritten historical record

March 27, 2026, 12-1 PM - CDS 1646

Public Goods Games with Nonlinearities

March 20, 2026, 12-1 PM - CDS 1635

Vision-Language Modeling for Neuropathological Evaluation

March 6, 2026, 12-1 PM - CDS 1635

Guaranteed Speech

February 27, 2026, 12-1 PM - CDS 1635

Propagating Surrogate Uncertainty in Bayesian Inverse Problems

February 20, 2026, 12-1 PM - CDS 1635

Quantitative evaluation frameworks for the trustworthiness of large language model outputs in medical domains

February 13, 2026, 12-1 PM - CDS 1646

Bayesian Online Model Selection with Yuke Zhang and Aida Afshar

February 6, 2026, 12-1 PM - CDS 1646

Fall 2025

PhD Seminar Series Holiday Party

December 12, 12 PM - CDS 1635

WinoReferral: A Benchmark to Evaluate LLM Responses to Prompts Implying Mental Health Disorders with Micah Benson

December 5, 12 PM - CDS 1646

Bayesian Predictive Modeling: Towards Martingale Posterior Distributions for Dynamical Systems with Clark Ikezu

November 21, 12 PM - CDS 1646

Multi-Stain Learning for Robust Neuropathology Evaluation with Lingyi Xu

November 14, 12 PM - CDS 1646

MCP Servers: Why You Need to Know About Them and How They Work with Jeff Hastings

November 7, 12 PM - CDS 1646

Spherical CNN's and DeepSurv for Psychosis Conversion and The Trick-or-Treat Index with Phillip Angelos

October 31, 12 PM - CDS 1646

Policy Modeling for Sex Trafficking Legislation in Massachusetts with Gabe McDonnell-Maayan

October 24, 12 PM - CDS 1646

Data Assimilation under Environmental Disturbance with Jacob Epstein and Projected Changes to Western U.S. Atmospheric Rivers by Vlad Munteanu

October 17, 12 PM - CDS 1646

How do I get LLMs Up and Running Quickly? with Yan (Stella) Si

October 10, 12 PM - CDS 1646

Interdependent Bilateral Trade: Information vs Approximation

October 3, 12 PM - CDS 1646

Spring 2025

Attention-Based Deep Learning for Analysis of Pathology Images and Gene Expression Data in Lung Squamous Premalignant Lesions with Lingyi Xu

April 18, 12 PM - CDS 1646

Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models with Zhengyang Shan

April 4, 12 PM - CDS 1646

Computing and Collective Action with Freddy Reiber

March 21, 12 PM - CDS 1646

Audited Auctions: Addressing Externalities in One-sided Mechanisms with Tejovan Parker & Gabe Maayan

February 21, 12 PM - CDS 1646

The Suicidal Mind with Gabe McDonnell-Maayan

January 31, 12 PM - CDS 1646

Fall 2024

Labor Unions and Digital Democracy with Freddy Reiber

November 22, 11 AM - CDS 1646