Skip to Main Content
School of Public Health

​
  • Admissions
  • Research
  • Education
  • Practice
​
Search
  • Newsroom
    • School News
    • SPH This Week Newsletter
    • SPH in the Media
    • SPH This Year Magazine
    • News Categories
    • Contact Us
  • Research
    • Centers and Groups
  • Academic Departments
    • Biostatistics
    • Community Health Sciences
    • Environmental Health
    • Epidemiology
    • Global Health
    • Health Law, Policy & Management
  • Education
    • Degrees & Programs
    • Public Health Writing
    • Workforce Development Training Centers
    • Partnerships
    • Apply Now
  • Admissions
    • Applying to BUSPH
    • Request Information
    • Degrees and Programs
    • Why Study at BUSPH?
    • Tuition and Funding
    • SPH by the Numbers
    • Events and Campus Visits
    • Admissions Team
    • Student Ambassadors
    • Frequently Asked Questions
  • Events
    • Public Health Conversations
    • Full Events Calendar
    • Alumni and Friends Events
    • Commencement Ceremony
    • SPH Awards
  • Practice
    • Activist Lab
  • Careers & Practicum
    • For Students
    • For Employers
    • For Faculty & Staff
    • For Alumni
    • Graduate Employment & Practicum Data
  • Public Health Post
    • Public Health Post Fellowship
  • About
    • SPH at a Glance
    • Advisory Committees
    • Strategy Map
    • Senior Leadership
    • Accreditation
    • Diversity, Equity, Inclusion, and Justice
    • Directory
    • Contact SPH
  • Support SPH
    • Big Ideas: Strategic Directions
    • Faculty Research and Development
    • Future of Public Health Fund
    • Generation Health
    • idea hub
    • Public Health Conversations
    • Public Health Post
    • Student Scholarship
    • How to Give
    • Contact Development and Alumni Relations
  • Students
  • Faculty & Staff
  • Alumni
  • Directory
Read More News
Erin Johnston
health communications

Student Receives 2025 Pulitzer Center Reporting Fellowship

Senior woman making a weed joint at home
medical marijuana

Cannabis Use Disorder Is Increasing Rapidly, Especially Among Older Adults and People Living With HIV

‘We Need Fundamentally New Ways of Analyzing and Interpreting Data’.

October 16, 2019
Twitter Facebook

Genomic data infographicA version of this article was originally published in Amstat News.

The School of Public Health, in coordination with the Department of Biostatistics and the Boston University Department of Mathematics and Statistics, will jointly host a Dean’s Symposium, titled “Statistics and the Life Sciences: Creating a Healthier World,” on November 15.

Co-hosted by the American Statistical Association, the Institute for Mathematical Statistics, and the National Institute of Statistical Sciences, and open to attendees both in person and via webinar, the one-day symposium will feature short presentations and discussion of statistical challenges and progress toward solutions in a handful of emerging and mission-critical areas of the health sciences. Specifically, the symposium will focus on digital health, machine learning in causal inference, and networks for public health.

The symposium will feature two plenary speakers—Joseph Lehar of Janssen Pharmaceutical Companies of Johnson & Johnson, and Susan Murphy of Harvard University—and two keynote speakers—David Dunson of Duke University and Vadim Zipunnikov of Johns Hopkins University.

As a warm up for the symposium, each of the four speakers was asked a set of three questions central to the intended focus, seeking their thoughts about how statistics has most affected the health sciences in the recent past, what constitutes the biggest statistical challenges in the health sciences for the coming decade, and how we might best meet these challenges. Their responses are summarized below.

In what way do you feel statistics has had the biggest impact on the life sciences in the past decade?

The common response was that this impact was two-fold in nature, consisting of (i) support for the use of massive, diverse, and complex forms of data and (ii) the development of statistical machine learning methods for their analysis. Lehar noted how statistics has been key to “enabling the integration and analysis of very complex data sets across very diverse sources of information.” As an example of the impact of machine learning, he highlighted the use of such methods “to automate classifying disease phenotypes that used to rely on subjective and imprecise expert opinions (e.g. deep neural networks on cancer pathology images or machine learning on molecular profiles to produce actionable clinical biomarkers for matching patients to therapies).”

Similarly, Dunson spoke of the transition from more traditional ‘small data’ to ‘big data,’ citing a host of new measurement technologies whose use is enabled by statistics—from single cell RNA sequencing to electronic medical records, and from mobile health devices to social media. Summing up, he stated, “Statistics has had a fundamental impact on this paradigm shift in the way life science is being connected; there is no use in collecting such data unless we have reliable and reproducible methods for analysis and interpretation. The development of ‘big data’ statistics has freed up scientists to be creative in developing and exploring new sources of data.”

What do you think constitutes the biggest statistical challenge(s) in the health sciences for the coming decade?

Here, the responses were diverse, reflecting in many ways the diversity of interests and research areas among the speakers. Murphy, speaking from her perspective at the forefront of clinical trials design and analysis, asked, “How do we harness vast amounts of data—both from many individuals as well as on any one individual—to enhance and increase the impact of clinical trials?”

On the other hand, speaking from the vantage of his expertise in digital mobile health, Zipunnikov pointed to the challenges posed by the need to extract value and insight from the massive, complex, and diverse data resulting from “multi-system real-time monitoring of human physiology and ambient environmental exposure.” He further commented, “[The] main analytic challenges are centered around the complexity of digital mobile health measures that are inherently intensively longitudinal, have different time scales, have different measurement, have differences in subjective interpretation of scales, exhibit huge between and within subject heterogeneity across days and weeks of observation, follow significant diurnal and weekly patterns, and often have substantial potentially informative missingness.” All of which are further complicated by substantial cross-dependence among measurement modalities.

From the perspective of someone working across industry and academia at the frontier of oncology research, Lehar summed up in just two words: “incomplete data.” He added, “Rarely do we have good coverage of enough data types across many patients. This limits the extent to which machine learning can be applied, and thus the problems we can address.”

Finally, Dunson provided a general and sobering comment about statistics and the health sciences in general, saying, “It is definitely the case that the rapid pace of production of data of unprecedented size and complexity has overwhelmed the statistical community. We lack the necessary tools to properly analyze these data streams, and we lack the necessary pool of talent to implement current tools appropriately, while also developing transformative new tools in a data/science-driven manner.” He further pointed to cultural challenges within statistics, particularly in contrast to culture in the broader machine learning community, professing that “the priorities in statistics departments in academia often run counter to meeting these challenges.” The stakes are high: “The increasing focus on ML algorithms, instead of statistical methods having a formal framework for accommodating uncertainty quantification and dealing with critical issues such as selection bias, has been leading to a critical reproducibility problem in science.”

What is needed to meet this challenge(s)?

Zipunnikov called for engagement of and by statisticians to meet the challenges he raised, saying, “The process of transforming data into knowledge is impossible without active intellectual participation of statisticians in major multidisciplinary efforts that focus on conceptualization, measurement, analysis, and treatment of myriad physiological, behavioral, and mental health conditions.” As a positive example, he pointed to the mobile Motor Activity Research Consortium for Health (mMARCH) that he and others recently formed as an international network to leverage the potential of digital mobile health.

In a similar vein, Lehar called for increased data sharing to address the challenge of incomplete data, noting too much of a tendency toward data “silos.” Emphasizing the central importance of this step, he stated, “A more concerted effort to share data across diverse providers is essential to truly realize the dream of precision medicine.”

Alternately, Murphy called for increased attention to the “development of conceptual ideas for harnessing big data in clinical trial design and execution.” Further, she cited the need for “training in the underlying principles of trial design (e.g., going back to Fisher and Hill) combined with training in computational methods and statistical principles related to replicability.”

Last, Dunson called for nothing short of a revolution, echoing other recent calls of a similar nature (e.g., the NSF Crossroads project): “We need to fundamentally revamp the statistics education curriculum to prepare students with high-quality tools for analyzing and interpreting the massive-scale complex data being routinely collected. We need to revamp the reward system in academics to favor the development of truly innovative methods that are actually of direct utility in analysis of large-scale scientific data sets over incremental methods with seemingly strong asymptotic support. Less focus on publication volume and more focus on impact/innovation of a few key publications in tenure decisions. We need fundamentally new ways of analyzing and interpreting data and more of a paradigm for appropriately dealing with truly complex data that require pre-processing and face computational challenges in storage, transfer, and processing.”

The Boston University symposium promises to serve as a forum for discussion of these and other cutting-edge topics at the intersection of statistics and the health sciences in a format broadly accessible to the larger data science community.

Register for this free symposium. It is open to the public to join in person or online.

Josée Dupuis is professor and chair of the Department of Biostatistics at the Boston University School of Public Health. Eric D. Kolaczyk is a professor and data science faculty fellow in the Department of Mathematics and Statistics at Boston University.

Explore Related Topics:

  • biostatistics
  • dean's symposia
  • Dean's Symposium
  • Q&A
  • signature events
  • Signature Programs
  • statistical analyses
  • statistics
  • Share this story

Share

‘We Need Fundamentally New Ways of Analyzing and Interpreting Data’

  • Facebook
  • Reddit
  • LinkedIn
  • Email
  • Print
  • More
  • Twitter

More about SPH

Sign up for our newsletter

Get the latest from Boston University School of Public Health

Subscribe

Also See

  • About
  • Newsroom
  • Contact
  • Support SPH

Resources

  • Students
  • Faculty & Staff
  • Alumni
  • Directory
  • Boston University School of Public Health
  • 715 Albany Street, Boston, MA 02118
  • © 2021 Trustees of Boston University
  • DMCA
  • Facebook
  • YouTube
  • LinkedIn
  • Instagram
  • TikTok
© Boston University. All rights reserved. www.bu.edu
Boston University Masterplate
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.