Home » Additional Features

Boston University to Hold Symposium on Statistics and Life Sciences

1 October 2019 1,046 views No Comment
Josée Dupuis and Eric D. Kolaczyk

    The Boston University School of Public Health, in coordination with the biostatistics department and the department of mathematics and statistics, will jointly host a dean’s symposium, titled “Statistics and the Life Sciences: Creating a Healthier World” on November 15.

    Co-hosted by the American Statistical Association, Institute for Mathematical Statistics, and National Institute of Statistical Sciences and open to attendees both in person and via webinar, the one-day symposium will feature short presentations and discussion of statistical challenges and progress toward solutions in a handful of emerging and mission-critical areas of the health sciences. Specifically, the symposium will focus on digital health, machine learning in causal inference, and networks for public health.

    The symposium will feature two plenary speakers—Joseph Lehar of Janssen Pharmaceutical Companies of Johnson & Johnson and Susan Murphy of Harvard University—and two keynote speakers—David Dunson of Duke University and Vadim Zipunnikov of The Johns Hopkins University. As a warm up for the symposium, each of the four speakers was asked a set of three questions central to the intended focus, seeking their thoughts about how statistics has most affected the health sciences in the recent past, what constitutes the biggest statistical challenges in the health sciences for the coming decade, and how we might best meet these challenges. Their responses are summarized below.

    In what way do you feel statistics has had the biggest impact on the life sciences in the past decade?

    The common response was that this impact was two-fold in nature, consisting of (i) support for the use of massive, diverse, and complex forms of data and (ii) the development of statistical machine learning methods for their analysis. Lehar noted how statistics has been key to “enabling the integration and analysis of very complex data sets across very diverse sources of information.” As an example of the impact of machine learning, he highlighted the use of such methods “to automate classifying disease phenotypes that used to rely on subjective and imprecise expert opinions (e.g., deep neural networks on cancer pathology images or machine learning on molecular profiles to produce actionable clinical biomarkers for matching patients to therapies).”

    Similarly, Dunson spoke of the transition from more traditional ‘small data’ to ‘big data,’ citing a host of new measurement technologies whose use is enabled by statistics—from single cell RNA sequencing to electronic medical records, and from mobile health devices to social media. Summing up, he stated, “Statistics has had a fundamental impact on this paradigm shift in the way life science is being connected; there is no use in collecting such data unless we have reliable and reproducible methods for analysis and interpretation. The development of ‘big data’ statistics has freed up scientists to be creative in developing and exploring new sources of data.”

    What do you think constitutes the biggest statistical challenge(s) in the health sciences for the coming decade?

    Here, the responses were diverse, reflecting in many ways the diversity of interests and research areas among the speakers. Murphy, speaking from her perspective at the forefront of clinical trials design and analysis, asked, “How do we harness vast amounts of data—both from many individuals as well as on any one individual—to enhance and increase the impact of clinical trials?”

    On the other hand, speaking from the vantage of his expertise in digital mobile health, Zipunnikov pointed to the challenges posed by the need to extract value and insight from the massive, complex, and diverse data resulting from “multi-system real-time monitoring of human physiology and ambient environmental exposure.” He further commented, “[The] main analytic challenges are centered around the complexity of digital mobile health measures that are inherently intensively longitudinal, have different time scales, have different measurement, have differences in subjective interpretation of scales, exhibit huge between and within subject heterogeneity across days and weeks of observation, follow significant diurnal and weekly patterns, and often have substantial potentially informative missingness.” All of which are further complicated by substantial cross-dependence among measurement modalities.

    From the perspective of someone working across industry and academia at the frontier of oncology research, Lehar summed up in just two words: “incomplete data.” He added, “Rarely do we have good coverage of enough data types across many patients. This limits the extent to which machine learning can be applied, and thus the problems we can address.”

    Finally, Dunson provided a general and sobering comment about statistics and the health sciences in general, saying, “It is definitely the case that the rapid pace of production of data of unprecedented size and complexity has overwhelmed the statistical community. We lack the necessary tools to properly analyze these data streams, and we lack the necessary pool of talent to implement current tools appropriately, while also developing transformative new tools in a data/science-driven manner.” He further pointed to cultural challenges within statistics, particularly in contrast to culture in the broader machine learning community, professing that “the priorities in statistics departments in academia often run counter to meeting these challenges.” The stakes are high: “The increasing focus on ML algorithms, instead of statistical methods having a formal framework for accommodating uncertainty quantification and dealing with critical issues such as selection bias, has been leading to a critical reproducibility problem in science.”

    What is needed to meet this challenge(s)?

    Zipunnikov called for engagement of and by statisticians to meet the challenges he raised, saying, “The process of transforming data into knowledge is impossible without active intellectual participation of statisticians in major multidisciplinary efforts that focus on conceptualization, measurement, analysis, and treatment of myriad physiological, behavioral, and mental health conditions.” As a positive example, he pointed to the mobile Motor Activity Research Consortium for Health (mMARCH) that he and others recently formed as an international network to leverage the potential of digital mobile health.

    In a similar vein, Lehar called for increased data sharing to address the challenge of incomplete data, noting too much of a tendency toward data ‘silos.’ Emphasizing the central importance of this step, he stated, “A more concerted effort to share data across diverse providers is essential to truly realize the dream of precision medicine.”

    Alternately, Murphy called for increased attention to the “development of conceptual ideas for harnessing big data in clinical trial design and execution.” Further, she cited the need for “training in the underlying principles of trial design (e.g., going back to Fisher and Hill) combined with training in computational methods and statistical principles related to replicability.”

    Last, Dunson called for nothing short of a revolution, echoing other recent calls of a similar nature (e.g., the NSF Crossroads project): “We need to fundamentally revamp the statistics education curriculum to prepare students with high-quality tools for analyzing and interpreting the massive-scale complex data being routinely collected. We need to revamp the reward system in academics to favor the development of truly innovative methods that are actually of direct utility in analysis of large-scale scientific data sets over incremental methods with seemingly strong asymptotic support. Less focus on publication volume and more focus on impact/innovation of a few key publications in tenure decisions. We need fundamentally new ways of analyzing and interpreting data and more of a paradigm for appropriately dealing with truly complex data that require pre-processing and face computational challenges in storage, transfer, and processing.”

    The Boston University symposium promises to serve as a forum for discussion of these and other cutting-edge topics at the intersection of statistics and the health sciences in a format broadly accessible to the larger data science community.

    Register for this free symposium. It is open to the public to join in person or online.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading...

    Comments are closed.