Methodological Research

Bayesian Methods

The Bayesian approach to modern data analysis provides a complete paradigm for both statistical inference and decision making under uncertainty. The Bayesian framework requires a sampling model, as well as a prior distribution, on all unknown quantities in the model (parameters as well as missing data). The prior and likelihood are used to compute the conditional (posterior) distribution of all unknowns given the observed data, from which statistical inference is made. Bayesian methods contain as particular cases many of the frequently used classical statistical procedures and solve many of the difficulties faced by conventional statistical methods. The implementation of Bayesian methodology often relies on computationally intensive algorithms such as Monte Carlo and Markov chain Monte Carlo. Bayesian inference has been widely used in many common statistical scenarios, such as linear and generalized regression, model selection, stochastic processes, multivariate statistics, non-parametric statistics and measurement error models, and new and improved Bayesian modeling approaches in these are continually under development. With the enormous progress and increased availability of powerful computers in the last decade, Bayesian methods are having a huge impact in fields of application that vary as widely as clinical trials, epidemic studies, public policy, internet-based marketing and climate change. Faculty in the Biostatistics department are actively engaged in the development of novel Bayesian methodology and computational algorithms and their applications in many areas of biomedical research.

Collaborative Research Topics: Aging, Cancer, Genetics/Genomics, System Biology
Research Studies: Framingham Heart Study, Long Life Family Study, MAVERIC, New England Centenarian Study, SickleGen
Faculty involved: Doros, Gupta, Sebastiani, Xing

Clinical Trials

Clinical trials are one of the most important types of experimental study designs used in medical research. They are defined by the population to be studied, an intervention, control and outcome of interest (PICO). Clinical trials may involve a drug, device, education or other type of intervention. Starting with the concept of randomized experiments introduced by Fisher in 1925, clinical trials have come a long way with numerous innovations in study design, leading to a corresponding need for the development of new statistical tools for data analysis. A number of department faculty are involved in clinical trials with medical researchers, working on problems relating to the conduct and monitoring of trials, the analysis of efficacy and safety data, and Bayesian methods for clinical trials.

Collaborative Research Topics: Addiction, Aging, Autism, Cancer, Cardiovascular Disease, Complementary and Alternative Medicine (CAM)
Research Studies: MAVERIC
Faculty involved: Cheng, D’Agostino, Doros, Fish, Heeren, LaValley, Massaro, Pencina, Sullivan, Sun, Weinberg

Correlated Data Analysis

Correlated data analysis is the analysis of any outcome where the units of observation are not assumed independent. This includes longitudinal data analysis of repeated observations on the same variable over time, and analysis of clustered data, such as outcomes from members of the same family, classroom, doctor’s office or geographic region. Correlated data can arise in any study, including observational studies and clinical trials. Studies with correlated data require specialized methods of statistical analysis that take into account the possible correlation. If the correlation is not taken into account, statistical inference may be invalid. Specialized methods include generalized linear models for correlated data, mixed or random effects models, nonlinear mixed effects models, generalized estimating equations (GEE) and hierarchical models. Particularly powerful and flexible are Bayesian methods.

Collaborative Research Topics: Addiction, Aging, Cancer, Cardiovascular Disease, Genetics/Genomics, Infectious Disease, Musculoskeletal Diseases, System Biology
Research Studies: Black Women’s Health Study, Framingham Heart Study, Long Life Family Study, MAVERIC, New England Centenarian Study, SickleGen, Superfund
Faculty involved: Cabral, Cupples, Cheng, Demissie, Dupuis, Gagnon, Liu, Nelson, Sebastiani, Sullivan, Weinberg, White, Xing, Yang


Different studies on the same research question in the medical literature often come up with differing conclusions, causing confusion among practitioners and policymakers in determining which set of results to follow. Meta-analysis is an area of statistics that seeks to combine the results of a set of studies on a topic with the goal of providing an improved, more precise estimate of the effect of a treatment or exposure. Meta-analysis provides a set of systematic statistical procedures to combine and summarize results across studies, highlighting the consistency or inconsistency of the results and evaluating the various threats to the validity of the combined estimate.

Collaborative Research Topics: Aging, Arthritis, Genetics/Genomics
Research Studies: Framingham Heart Study
Faculty involved: CabralCupples, DemissieDupuis, Liu, LunettaLaValley

Pharmaco Statistics

With the increasing availability of large administrative databases in the healthcare field, there is a need for statistical methods to evaluate medications that run the gamut from new methods in clinical trials to large observational studies. In the area of large epidemiological studies based on databases, there is a need for statistical methods to complement the work being done in pharmacoepidemiology, including propensity scores methods and instrumental variables as well as techniques to model drug effects over time,  data reduction techniques and longitudinal data methods.

Collaborative Research Topics: Addiction, Cancer, Pharmaco Epidemiology
Research Studies: MAVERIC
Faculty involvedGagnon

Risk Prediction

Risk prediction models are an integral part of modern medicine and public health. They have been developed for all leading causes of mortality and morbidity, including cardiovascular disease and its components and risk factors as well as different forms of cancer.  Some of these models have been incorporated into treatment guidelines, many are routinely used by physicians. Recently, new biomarkers and genetic factors have been proposed to improve the existing prediction models and questions of how to assess the incremental value of these new risk factors have taken the center stage. The main purpose of a risk prediction model is to identify individuals at high risk for the development of a specific disease in a given time horizon. Risk prediction models are based on different statistical techniques with regression approaches among the most popular. Emerging statistical methods are based on Bayesian machine learning techniques that have been very successful in other studies such as finance, credit scoring, fraud detection. Various statistical methods have been proposed for model development, testing and validation. The performance of these models is characterized in terms of discrimination, calibration and more recently, reclassification with numerous statistical metrics proposed for each characteristic. New methods have been proposed to quantify the increment in model performance and it remains an area of active research. Department faculty have pioneered many theoretical developments in the area of risk prediction and model performance and have been the lead authors on publications which have proposed some of the most widely used risk prediction models. Also, department faculty have introduced novel approaches to risk prediction using genetic and genomic data.

Collaborative Research Topics: Cardiovascular Disease, Genetics/Genomics
Research Studies: Black Women’s Health Study, Framingham Heart Study, New England Centenarian Study
Faculty involved: Beiser, D’Agostino, Massaro, Nelson, Pencina, SebastianiSullivan

Spatial Statistics

Spatial and spatio-temporal statistics involves the task of analyzing data that are collected over a geographical area (and possibly referenced at different times). With the advent of geographic information systems (GIS) that can display and summarize spatial data, there is an increasing need and effort towards development of methods that can efficiently and accurately handle such data to make statistical inference. Spatial statistics is widely used in many different fields, such as the environmental sciences (identifying health effects from environmental contamination, tracking of migrant animal populations), epidemiology (disease cluster detection, monitoring the spread of disease outbreaks and epidemics) and general public health (water needs, land usage).

Collaborative Research Topics: Environmental Epidemiology
Research Studies: Superfund
Faculty involvedWeinbergWhite

Statistical Computing

With the continual development of new and improved experimental technologies in the physical and medical sciences, an ongoing challenge is to develop statistical methods for analyzing larger and increasingly complex data sets. Many standard statistical methods break down or are highly inefficient in the presence of high dimensional data. With the availability of powerful computers over the last decade, questions that statisticians have been unable to resolve through standard estimation techniques have been successfully addressed through computational methods and statistical sampling-based tools such as bootstrap, jackknife, Monte Carlo and Markov chain Monte Carlo. These techniques are becoming increasingly prevalent in many fields of application in public health and medicine, and are an important area of interdisciplinary research bringing together statistics, computer science and the physical sciences.

Collaborative Research Topics: Aging, Cancer, Genetics/Genomics, Infectious Disease
Research Studies: Framingham Heart Study, Long Life Family Study, MAVERIC, New England Centenarian Study, SickleGen
Faculty involved: DemissieGagnon, Gupta, Liu, Sebastiani, WhiteYang

Statistical Genetics/Genomics

Statistical genetics, genomics and bioinformatics are areas of biomedical research that have made huge strides over the last decade and continue to hold huge potential for monumental breakthroughs in the coming years. Traditionally, statistical genetics is considered a branch of genetics that applies Mendelian inheritance to population groups and studies the frequency of alleles and genotypes in populations and their effect on disease risk. Statistical genomics originally focused on problems arising in the field of molecular biology, such as extracting information from DNA and protein sequence to understand their functional and evolutionary relationships. However, with developments in modern experimental technologies, such as high-throughput microarrays and deep sequencing methods, as well as an increased level of understanding of biological systems, the gap between these two fields is rapidly converging. Department faculty are heavily involved in many areas of genetics and genomics research, such as quantitative trait mapping, analysis of family-based and genome-wide association studies, effects of population stratification, gene transcription regulation analysis, gene network modeling, DNA sequence and structure analysis, analysis of next generation sequencing data and proteomics.

Collaborative Research Topics: Aging, Cancer, Cardiovascular Disease, Genetics/Genomics
Research Studies: Black Women’s Health Study,Framingham Heart Study, Long Life Family Study, MAVERIC, New England Centenarian Study, SickleGen
Faculty involved: CupplesDemissie, Destefano, Dupuis, Gupta, LiuLunetta, Sebastiani, Xing, Yang


Monitoring the health of populations is an integral part of public health practice. Surveillance systems are designed to detect disease outbreaks, observe the health of populations, and inform public policy decisions. Statistical methods for public health surveillance include classical epidemiological tools, survey sampling, spatial and time series methods, as well as novel methods for the detection of disease outbreaks. These methods must have the capability to rapidly amalgamate existing knowledge and incoming information to detect aberrant events, such as a bioterrorist event or the emergence of a new disease like Avian Influenza and provide data to accurately describe the health of populations.

Collaborative Research Topics: Infectious Disease, Cancer, Environmental Epidemiology
Research Studies: Harvard Center for Communicable Disease Dynamics.
Faculty involvedWhite