Computing and Data Science PhD Student Seminar Series
The Boston University PhD program is home to a wide range of students, all studying various facets of data science. To help give students a friendly opportunity to practice and develop their research skills, we are launching the Computing and Data Science PhD Student Seminar Series. This series is focused on allowing doctoral students to present their research within a supportive and collaborative environment. Each seminar offers students a chance to share their findings, practice presentation skills, and receive constructive feedback from peers and faculty in a friendly, non-judgmental setting. This format not only helps students refine their work but also fosters essential communication skills that are crucial for their academic and professional careers.
Beyond the academic benefits, the seminar series is a community-building endeavor that seeks to strengthen connections among CDS students. By creating a space for students to share their work with the public, students from various backgrounds can learn from each other's experiences and methodologies.
The seminar series, organized by students Freddy Reiber, Lingyi Xu, and Yan (Stella) Si, meets weekly throughout the year on Fridays from noon to 1 PM, with lunch during the talk. Students interested in giving a talk should reach out to the organizers through email.
How do I get LLMs Up and Running Quickly? with Yan (Stella) Si
October 10, 12 PM - CDS 1646
Abstract: With the rise of large language models (LLMs), new experimental methods are emerging across disciplines — especially in the social sciences. Researchers are increasingly interested in using LLMs as reasoning tools or even as synthetic study participants. But how do you actually work with these tools in practice? In this workshop, we will walk through how to get an LLM up and running, whether through an API or locally on your own machine. We will focus on the most accessible ways that are cheap and high-quality so you can begin experimenting right away.
Bio: Stella is a PhD student at Boston University Computing and Data Sciences, where she works at the intersection of cognitive science and AI.
Her research centers on modeling human decision making, combining neural networks with traditional cognitive models to uncover the psychological principles behind how we choose. She is also building large-scale, high-quality datasets to drive this work forward.
Policy Modeling for Sex Trafficking Legislation in Massachusetts with Gabe McDonnell-Maayan
October 24, 12 PM - CDS 1646
Abstract: Gabe will present a work-in-progress project that develops a decision-support tool to guide policymaking on sex trafficking legislation in Massachusetts. Sex trafficking, the largest form of modern-day slavery, remains a serious issue across the United States. In Massachusetts, advocacy organizations are actively pushing for competing legislative approaches. In collaboration with a subject-matter expert, Gabe's team constructed a simulation of the commercial sex system and calibrated it to Massachusetts using diverse data sources. Preliminary results from the model are intended to inform upcoming deliberations of the state senate judiciary committee.
Gabe will discuss the problem of sex trafficking and the broader landscape of commercial sex work, with a focus on Massachusetts. This includes an examination of the limited data on sex work in the United States and the methods we use to generate Massachusetts-specific estimates. Gabe will also review potential legislative approaches and the history of advocacy efforts in the state. Next, Gabe will walk through the process of developing a simulation model of commercial sex work and sex trafficking. Finally, Gabe will present preliminary results, highlight their implications for current policy debates, and show how the model can serve as a tool for evaluating future intervention strategies.
Bio: Gabe McDonnell-Maayan is a PhD candidate in Boston University's Faculty of Computing and Data Sciences whose work bridges computational innovation and pressing societal challenges. His research applies tools from complexity science—such as system dynamics modeling, agent-based modeling, and machine learning—to understand and influence the behavior of complex social systems. Gabe’s primary focus is advancing suicide prevention through computational modeling, enabling policymakers and practitioners to test interventions in silico before implementing them in the real world. Beyond suicide prevention, he has contributed to projects addressing sex trafficking, political polarization, pandemic response, and food security. Prior to his doctoral studies, Gabe worked as a software engineer at the MITRE Corporation and earned his Bachelor of Science in Computer Science from Rensselaer Polytechnic Institute.
Spherical CNN's and DeepSurv for Psychosis Conversion and The Trick-or-Treat Index with Phillip Angelos
October 31, 12 PM - CDS 1646
Abstract: Short presentation on incomplete SCNN for Psychosis Progression plus a Halloween science-related presentation.
Bio: Phillip Angelos is a PhD student in the Faculty of Computing and Data Sciences at Boston University, advised by Dr. Joshua Peterson. He earned a Bachelor of Science in Psychology from Michigan State University and spent two years at Yale University researching positive symptom progression in psychosis. His research examines the intersection of artificial intelligence and psychology, with a focus on deep learning, decision-making, impulsivity, and related behavioral patterns.
MCP Servers: Why You Need to Know About Them and How They Work with Jeff Hastings
November 7, 12 PM - CDS 1646
Abstract: MCP have revolutionized the way AI connects to resources. Standard API approaches required the developer to customize each connection. With MCP servers, a standardized protocol replaces these fragmented API connections with a single, universal connection method. MCP servers enable the researcher to connect to multiple data sources simultaneously, create reproducible data pipelines, get the most out of agentic AI, and build large libraries that can be quickly and easily queried/summarized.
Bio: Jeff Hastings is a PhD student in the Faculty of Computing & Data Sciences at Boston University, advised by Dr. Joshua Peterson. He earned a BA and MA in Political Science from Utah State University, followed by an MS in Computational Social Science from the University of California, San Diego. His research applies machine learning, deep learning, and reinforcement learning to better understand, explain, and improve human, artificial, and agentic decision-making. Prior to his PhD, he worked as an AI Data Scientist at Thermo Fisher Scientific.
Multi-Stain Learning for Neuropathology Evaluation with Lingyi Xu
November 14, 12 PM - CDS 1646
Abstract: Definitive diagnosis of neurodegenerative diseases traditionally relies on postmortem histopathology. While whole slide imaging has modernized pathology workflows, diagnostic performance still depends heavily on stain availability. We introduce a deep learning framework for multi-stain WSI analysis that operates effectively when some stains are missing. This approach offers a promising pathway to improve diagnostic accuracy in settings with limited staining resources.
Bio: Lingyi Xu is a Ph.D. student in the Faculty of Computing & Data Sciences at Boston University. She is currently working with Professor Vijaya B. Kolachalama on computation-assisted methods that help with cancer diagnosis and treatment. Her research focuses on graph representation learning, especially in clinical settings, to improve diagnostic accuracy, efficiency, and interpretability.
Modeling Group Interactions of Heterogenous Voters in the US Senate with Gavin Rees
November 21, 12 PM - CDS 1646
Abstract: Statistical models of interacting systems on discrete spaces can be effective causal models - for example, of yes/no voting - but their discrete sample space can turn normalization into a combinatorially complex endeavor: for example, normalizing the pairwise Ising model on the N dimensional binary (hyper)cube is NP-Complete. This lack of normalization can limit their utility and prevent rigorous comparisons to other models. Pairwise interacting models also suffer from quadratic parameter growth as the dimensionality of the sample space grows, unless interactions are structured in some way: for example, homogeneous interactions between groups (a block structured model). Group-structured pairwise interacting models can be effective causal models as well, and are easily normalizable, but aren’t able to capture individual heterogeneity that we suspect exists in some systems, e.g., political systems where every representative/voter has their own ideology (that there is individual heterogeneity is part of our prior). We describe results in exactly normalizing group-interacting pairwise Ising models with heterogeneous individual (linear and local) preferences within polynomial time complexity N^k, where N is the number of individuals and k is the number of groups. We discuss generalizations of this approach to effective low rank approximations of interacting systems, as well as potential applications to social systems, namely the US Senate.
Bio: Gavin Rees is a PhD student in Boston University’s Faculty of Computing & Data Sciences whose work combines mathematics and evolutionary biology. His research focuses on social behavior and combines approaches from theoretical biology, statistics, and evolutionary game theory to understand ecological and evolutionary dynamics of intertwined systems. His primary focus is on biological complexity, and he has worked in evolution of cooperation in many-player social dilemmas, as well as inferring social dynamics in political bodies. Prior to his doctoral studies, Gavin earned his Bachelor’s in Mathematics from Harvard University with a secondary in Computer Science, and worked as a software engineer at Markforged, and as research assistant at the Institute of Science and Technology Austria and the Complexity Science Hub, Vienna.
PhD Seminar Series Holiday Party
December 12, 12 PM - CDS 1646
Abstract: End of the Fall semester - Hooray! Join us for a holiday celebration to wrap up another successful semester of student research presentations.
Past Talks
Fall 2025
Interdependent Bilateral Trade: Information vs Approximation
October 3, 12 PM - CDS 1646
Abstract: This talk will introduce the area of mechanism design, and then focus on the problem of bilateral trade. Welfare maximization in bilateral trade has been extensively studied in recent years, primarily for the private values case. This talks will focus on welfare maximization in bilateral trade with interdependent values. Designing mechanisms for interdependent settings is much more challenging because the values of the players depend on the private information of others, requiring complex belief updates and strategic inference. Based on Interdependent Bilateral Trade: Information vs Approximation (EC25).
Spring 2025
CANCELED: Attention-Based Deep Learning for Analysis of Pathology Images and Gene Expression Data in Lung Squamous Premalignant Lesions with Lingyi Xu
April 18, 12 PM - CDS 1646
Abstract: Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions through a histologic progression from normal to hyperplasia, metaplasia, dysplasia, carcinoma in situ and invasive carcinoma. Endobronchial biopsies obtained via various bronchoscopy techniques are formalin fixed paraffin embedded, and hematoxylin and eosin stained (H&E) to access the pathologic features and histologic grade of the tissue. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging.
Here we proposed a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images of endobronchial biopsies and bulk gene expression data from previously published studies as well as new data obtained from high-risk patients. Our framework maximizes the use of training data by allowing sample inputs with one or both data modalities. The flexibility of our framework to make predictions when a data modality is missing and its ability to integrate data from different modalities and studies is important for advancing our stratification of bronchial premalignant lesions.
Bio: Lingyi is a PhD student in Computing & Data Sciences at Boston University. Her research investigates the potential of multimodal medical data in enhancing disease diagnosis and assessment. Her current work involves applying graph models and machine learning algorithms in digital pathology and cancer genomics to advance the understanding of lung precancerous conditions.
Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models with Zhengyang Shan
April 4, 12 PM - CDS 1646
Abstract: We introduce a comprehensive framework for assessing gender fairness in large language models (LLMs), particularly in their treatment of both binary and non-binary genders. Existing research has largely focused on binary gender distinctions, neglecting the inclusivity of non-binary identities. To address this, the authors propose a novel metric that evaluates LLMs across seven dimensions. The study conducts extensive evaluations on 15 popular LLMs, revealing significant discrepancies in their ability to fairly represent diverse gender identities.
Bio: Zhengyang is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Her research interests lie in the evaluation, interpretability, and fairness of Large Language Models (LLMs).
Computing and Collective Action with Freddy Reiber
March 21, 12 PM - CDS 1646
Abstract: Labor unions are a critical component of ensuring dignified working conditions for laborers. However, as a byproduct of neoliberalization American labor unions have been in a free-fall in terms of membership numbers. Drawing from sociological work on organized labor, we seek to analyze how labor organizers try to bring workers together, to act collectively, through digital communication technologies. Towards that end, we interviewed ~19 labor union members, who engaged in digital worker-to-worker organizing through tools like Slack or Discord focusing on how workers utilized and interacted with each other on these digital platforms. In this talk we provide preliminary results on this study along with some early discussions around developing useful technical tools for supporting worker-to-worker organizing.
Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.
Audited Auctions: Addressing Externalities in One-sided Mechanisms with Tejovan Parker & Gabe Maayan
February 21, 12 PM - CDS 1646
Abstract: We consider the form of externalities where some agent(s) have preferences over the outcome of a mechanism, and their preferences cannot be known before the mechanism is run. Often, this problem is solved by an auctioneer inserting dummy bids to represent the externalities. However, there may be ethical, trust, or power issues with delegating the determination of one's values to a central entity. And, it is extremely unreasonable to have all agents constantly estimate and report their values for the actions of all other agents in a system. Even if it were acceptable for a central entity to estimate externalities, it is more efficient to only audit what is more likely to be harmful, rather than auditing everything.
To address this, we consider auctions where the auctioneer has the power to (randomly) audit bidders to learn their externality, and impose penalties accordingly. In this setting, the power to audit results in equivalent bidder behavior as letting the auctioneer set individualized entry fees for bidders as a function of their non-manipulable externality type, and this results in thresholds of participation as functions of externality.
This setting is motivated by a variety of practical scenarios. For example, an auctioneer might run a social or traditional media platform where bidders compete to post news or ads on user feeds. In this setting, end users can experience bidders' posts as nuisance costs, incurring negative externalities.
Our objective is to maximize total welfare, i.e. the sum of individual value and externalities. In this paper, we show how penalty functions induce thresholds of participation, and prove analytically that welfare optimal participation thresholds in the i.i.d. setting with no competition are linear. Additionally, in the setting with competition for a single item and where i.i.d. bidders may only take two discrete types, the optimal threshold is linear and behaves analogously to Myersonian revenue-maximizing reserve-prices.
To illustrate results in more complicated settings, we use simulation with computational optimization to characterize welfare increases over participation threshold functions. We collect a dataset from X (formerly Twitter) to create an empirical joint-distribution of sender and receiver value, and simulate auctions from this empirical data. We find that optimal thresholds shift welfare from producers to users and increase overall welfare in all settings. We also observe that optimal thresholds are linear even with the empirical type distributions. However, the penalty functions will not be linear in general, which makes an interesting comparison to linear contracts. Our results suggest that auditing and penalizing externalities in real-world sponsored-search and advertising auctions have the potential to create substantial increases in social welfare.
Bio: Tejovan Parker is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he studied Mechanical and Global Engineering at the University of Colorado Boulder. He is interested in better management of social, political, and economic systems through mathematical and algorithmic methods. Tejovan began his PhD studies at BU in Fall 2022. In his first two years at CDS, he is building his expertise and looking to assist in existing research within misinformation markets.
Gabe is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.
The Suicidal Mind with Gabe McDonnell-Maayan
January 31, 12 PM - CDS 1646
Abstract: I present the "Suicidal Mind", a theory synthesis modeling exercise aiming to simulate a distressed mind. I will briefly cover the topics of the first talk: the challenges of, and different approaches to suicide research, relevant theories of suicide, and an overview of our model. I will then focus on various forms of model validation, including parameter sweeps, scenario recreation, GAM surface fitting, and multivariate time-series clustering.
Bio: Gabe is a third-year PhD student at Boston University’s Faculty of Computing and Data Sciences. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.
Fall 2024
Labor Unions and Digital Democracy with Freddy Reiber
November 22, 11 AM - CDS 1646
Abstract: Labor Unions have served an important role in giving workers a voice within the economy, however, this does not mean they are without critique. Central to many union critiques is the lack of meaningful democracy within unions, or what Robert Michels calls the “Iron law of oligarchy”. In the 2000s researchers thought that information and communication technology might serve as a solution to these problems, however as empirical literature developed, it became clear that ICTs were not the silver bullet theorists had originally hoped. This talk reviews literature on both the theorizing and empirical work of labor scholars and HCI researchers as to why ICTs didn’t provide a meaningful shift in union democracy as well as some ideas for future work.
Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.
View all posts