Biostatistics Seminars and Working Groups

Throughout the academic year in the department, faculty, students, and collaborators meet to discuss various research projects in progress. These meetings and working groups are great opportunities for students to sit in and participate in research discussions in the areas of clinical trials, statistical genetics, and observational studies. Students gain firsthand experience in developing research with faculty and master- and PhD-level students. Additionally, there are opportunities for students to present on a developing research area. Students are encouraged to attend and take advantage of these opportunities.



The Biostatistics Seminar Series is designed to engage faculty and students in research projects happening within our department and outside of Boston University. The purpose of the seminars is to widen and deepen participants’ knowledge of research in the field of Biostatistics and encourage collaboration in the field. We invite speakers from diverse research backgrounds to present their latest findings. Attendees are encouraged to participate in discussions and provide feedback. Lunch will be served.

Time and Location

The seminars are held from 1:00 to 2:00pm at 801 Massachusetts Avenue, Crosstown Building, Room 305.

Upcoming Seminars

Interested speakers should contact Yorghos Tripodis, PhD. Please include abstract(s) in your correspondence. You can also email Dr. Tripodis to be added to our seminar mailing list.

Seminar Title
Thursday, Sept. 8, 2016
George W. Cobb, Robert L. Rooke Professor of Statistics at Mount Holyoke College
How important is Markov Chain Monte Carlo?
My premise is that at base Markov Chain Monte Carlo (MCMC) is an answer to the challenge of computing an integral, with “integral” interpreted loosely as “adding up a very large number of very small numbers.”  The challenge dates back to the paradoxes of Zeno of Elea (490 – 430 BCE), was solved in principle by Archimedes (287 – 212 BCE), was solved analytically by Newton and Leibniz around 1665, and has been solved numerically and more broadly, over the last 60 years, by MCMC.At the risk of seeming grandiose, I suggest that only twice in history has mathematical theory invented a major new computational crank for Science to turn.  The first, calculus, is well-known.  The second, Markov Chain Monte Carlo, is still in the early stages of development.  Its reach remains largely unrecognized outside of Bayesian statistics and some areas of physics, and the depth of its influence is open to dispute.
Wednesday, Sept. 28, 2016
Jukka Corander, Professor of Biostatistics at the University of Oslo
100 years after William Bateson – what can we learn about epistasis by today’s statistical machine learning?
Roughly 100 years ago William Bateson coined the term epistasis, which is today generally considered as a form of interaction between two loci in DNA. Research in epistasis has over the years revealed fascinating biological insights, however, until very recently, progress has been slowed by limited availability of densely sampled population data. Given the advent of latest sequencing technology we are finally facing a possibility to query what experiments nature has performed concerning epistasis. I will present latest research results from a close collaboration with the Pathogen Genomics group at Sanger Institute on how we can advance understanding about epistasis in genomes by exploiting core concepts from statistical physics and statistical machine learning algorithms for ultra-high dimensional models. Our genome-wide epistasis analysis reveals interacting networks of resistance, virulence and core machinery genes in Streptococcus pneumoniae.
Thursday, Oct. 13, 2016
Rebecca Betansky, Professor of Biostatistics at Harvard T.H. Chan School of Public Health
Statistical Significance and P-values
I will discuss misconceptions and criticisms of p-values and significance testing as background for discussion of the American Statistical Association’s recent statement on p-values. I will discuss implications for education, collaboration, publication, grant funding along with some proposals for improved analysis.
Thursday, Nov. 10, 2016
Eric Tchetgen Tchetgen, Professor of Biostatistics and Epidemiologic Methods at Harvard T.H. Chan School of Public Health
Unification of the instrumental variable approach for causal inference and missing data
Unobserved confounding is a well known threat to causal inference with observational data. Likewise, selection bias can arise in the presence of missing data if there is an unobserved common cause of the nonresponse process and the potentially unobserved outcome . An instrumental variable for unobserved confounding (IV-C)  is a pre-exposure correlate of exposure known to only affect the outcome through its association with exposure. Likewise an instrumental variable for missing data (IV-M)  is a predictor of missingness which is otherwise independent of the outcome in the underlying population. We give general necessary and sufficient conditions for nonparametric  identification with an IV in settings (IV-C) or (IV-M), thus providing a unification of identification for causal inference and missing data with an IV. The approach equally applies for discrete or continuous IV and outcome.  Interestingly, the proposed approach provides an elegant solution to the identification problem of the marginal effect of treatment on the treated with an IV which has been a longstanding problem in causal inference.  For statistical inference incorporating high dimensional covariates, we present generalizations of inverse-probability weighting, outcome regression and doubly robust estimation with an instrumental variable that equally apply to IV-C and IV-M. In case identification fails, we describe novel IV bounds for the nonidentified parameter of interest and corresponding methods to account for all sources of uncertainty.  We illustrate the approach with simulation studies and several empirical examples.
Thursday, Dec. 15, 2016
Ian Marschner, Professor of Statistics at Macquarie University, Australia
Underestimation of treatment effects in sequentially monitored clinical trials that did not stop early
Over the last decade there has been a prominent discussion in the literature about the potential for overestimation of the treatment effect when a clinical trial stops at an interim analysis due to the experimental treatment showing a benefit over the control. However, there has been much less attention paid to the converse problem, namely, that sequentially monitored clinical trials which do not stop early tend to underestimate the treatment effect. In meta-analyses of many studies these two sources of bias will tend to balance each other to produce an unbiased estimate of the treatment effect. However, for the interpretation of a single study in isolation, underestimation due to interim analysis may be an important consideration. In this paper we discuss the nature of this underestimation, including theoretical and simulation results demonstrating that it can be substantial in some contexts. Furthermore, we show how a conditional approach to estimation, in which we condition on the study reaching its final analysis, may be used to validly inflate the observed treatment difference from a sequentially monitored clinical trial. As well as simulation results demonstrating the validity of these conditional estimation methods, we present a data analysis example from a pivotal clinical trial in cardiovascular disease. The methods will be most useful in contexts where an unbiased estimate of the treatment effect is of particular importance, such as in cost-effectiveness analysis or risk prediction.
Thursday, Mar. 9, 2017
Constantine Gatsonis, Henry Ledyard Goddard University Professor of Biostatistics and Chair of Biostatistics at Brown University
Reproducibility in Research
The question of reproducibility of research findings has become the topic of a broad and multi-faceted discussion in recent years. In this seminar we will begin with an overview of the various approached to reproducibility, highlighting the conceptual issues. We will then examine the link of reproducibility to the assessment of “statistical significance” and discuss ways in which the practice of reporting research findings may be contributing to the phenomenon.
Thursday, May 11, 2017
Vijaya B. Kolachalama, Adjunct Assistant Professor of Cardiovascular Medicine at Boston University
Machine learning and image processing for precision medicine
Over the past few years, the scientific community has witnessed a rapid increase in the adoption of cutting-edge data analytic tools such as machine learning (ML) to address several questions in clinical medicine. ML techniques give computers the ability to integrate discrete sets of data in an agnostic manner to find hidden insights and generate a disease-specific fingerprint. These tools are now rapidly being adopted in several specialties as unbiased, self-learning approaches for pathologic assessment. In this talk, I will present few examples in this area to discuss ongoing work in our laboratory at BUSM that is focused on digitized images as one form of input data to ML models, and show how we could associate them with several clinical outcomes of interest.
Thursday, Jun. 8, 2017
Kyu Ha Lee, Research Associate, Department of Biostatistics at Harvard University
Hierarchical Models for the Analysis of Semicompeting Risks Data
In the statistical literature, data that arise when the observation of the time to some non-terminal event is subject to some terminal event are referred to as ‘semicompeting risks data’. A major challenge in the analysis of semicompeting risks data arises due to the fact that study participants under investigation are often subject to competing forces of the terminal event that may not be independent of the non-terminal event. In addition, other challenges may arise due to practical considerations regarding data collection such that the observation of events of interest is subject to complex censoring mechanism. In this talk, I will present some motivating examples from ongoing studies and discuss the challenges involved in semi-competing risks data analysis. I will illustrate hierarchical models that have been recently developed to handle a range of different data scenarios.
Thursday, Jun. 15, 2017
Jason Flannick, Senior Group Leader and Research Associate at Mass General Hospital and the Broad Institute
Approaches to understand the genetic architecture and biology of type 2 diabetes
Genome-wide association studies (GWAS) have identified numerous common variants associated with modest increases in risk of type 2 diabetes (T2D), suggesting clues into the causes and biology of disease. And yet, these findings have explained only a limited fraction of the genetic basis of T2D and have been slow to translate to improved patient care. In this talk, I will discuss what large scale next-generation sequencing of thousands to tens of thousands of T2D patients has taught us beyond GWAS, and what this means for the future of genetic research into T2D and other complex diseases. I will first present methods and an analysis to quantify the genetic architecture of T2D based on large-scale sequencing, suggesting that models motivating next-generation sequencing may have been overly optimistic; I will then, however, present specific examples where next-generation sequencing has in fact identified high-impact alleles leading to important biological insights into T2D. I will argue that these findings suggest two paths toward the future of T2D genetics research: studies of larger and larger scale but analyses that are richer and more individualistic. I will conclude with thoughts on new research paradigms and bioinformatics methods development to leverage the valuable sequence data that now exists for T2D and other complex diseases, in order to more rapidly translate genetic associations to new medicines or improved patient care.
Thursday, Jul. 13, 2017
Nicholas J. Horton, Professor of Statistics at Amherst College
Big ideas to help statistics students learn to ‘think with data’
This is an exciting time to be a statistician. The contribution of the discipline of statistics (the science of learning from data) to scientific knowledge is widely recognized.  But there are challenges as well as opportunities in this new world of data.  In this talk, I will discuss a number of questions and big ideas with major implications for how we teach statistics and data science.  All too often we teach two-sample comparisons when the true relationship depends on other factors. In a world of found data, what issues of design and confounding are needed to disentangle complex relationships?  What theoretical foundations are needed for statisticians?  Can the long list of mathematical prerequisites for this course be reconsidered?  Statistics is increasingly a ‘team sport’.  How do we teach students to work effectively in groups and communicate their results?  In an era of increasingly big data, it is imperative that students develop data-related capacities, beginning with the introductory course. How do we integrate these precursors to data science into our curricula—early and often?  By fostering more multi-variable thinking, teaching about confounding, developing simulation-based problem solving, and building data-related skills, we can help to ensure that statisticians are fully engaged in data science and the analysis of the abundance of data now available to us.
Thursday, Sep. 14, 2017
Bethany Hedt-Gauthier, Assistant Professor of Global Health and Social Medicine at Harvard Medical School
Thursday, Oct. 5, 2017
Laura White, Associate Professor of Biostatistics at Boston University School of Public Health
Thursday, Nov. 9, 2017
Joseph Hogan, Professor of Biostatistics and Director of the Biostatistics Graduate Program at Brown University

This methods group is co-organized with faculty in the Departments of Biostatistics; Epidemiology; Health Law, Policy & Management; and Global Health. The group will meet weekly for 2 hours during the Fall semester. Over the course of 10 meetings, we will discuss frontier topics in applied econometrics and their relevance to population health science and health services research. As an organizing text, we will cover Susan Athey and Guido Imbens’ 2016 review paper “The State of Applied Econometrics – Causality and Policy Evaluation”. Each session will cover a different topic and student participants will be asked to present on the topic and place it into context of the prior literature. The discussion will focus on understanding the methods, identifying questions for further inquiry, identifying population health and health services applications, and discussing how the methods might be implemented. As a final product, students and post-docs taking the course will prepare a brief research proposal to implement one of the discussed methods in a future research project, with potential for future mentorship.

Please contact Dr. Yorghos Tripodis (Biostatistics) or Dr. Jacob Bor (Global Health) for more information.

Click here for more information.

Interested speakers should contact Gheorghe Doros or Sandeep Menon. Please include abstract(s) in your correspondence. You may also email Dr. Doros or Dr. Menon to be added to the Working Group mailing list.

Lead Faculty: Gheorghe Doros

The Statistical Genetics Working Group meets regularly from 9:30 to 11 a.m. every other Friday at 801 Massachusetts Avenue, Crosstown Building, Room 305. The goal is to get to know each other, learn cutting-edge research, foster collaboration, and get help. The group brainstorms together in September to lay out the yearlong topics of interest for discussion. At each session, one person, group of participants, or invited outside speaker presents (formally/informally) the material, usually pertaining to his or her area of expertise, interest, or research, and leads the discussion. Our participants include a mix of Biostatistics and Bioinformatics students in addition to faculty members involved in genetics research. Please click here for more information.
Lead Faculty: Ching-Ti Liu.

The Genetic Analysis Workshops (GAWs) are a collaborative effort among genetic epidemiologists and statistical geneticists to develop, evaluate, and compare statistical genetic methods. They are coordinated by the Southwest Foundation for Biomedical Research.

Monthly student luncheon followed by seminar, which features innovative speakers in the area of statistical genetics. More information. Contact: Haldan Smith