Event summary: From Data to Action – April 30, 2024
Data science has widespread potential for improving the ability to predict the spread of infectious diseases. On April 30, 2024, CEID welcomed Caroline Buckee, PhD; Alessandro Vespignani, PhD; and Larry Madoff, MD to discuss its current and potential future uses.
Dr. Madoff, who serves as Medical Director of the Bureau of Infectious Disease and Laboratory Sciences for the Massachusetts Department of Public Health, spoke about the massive resource of aggregated data from state vaccination records. These records allow state epidemiologists to look at case reports from data in aggregate to, for example, identify adverse reactions to vaccines. Similarly, all hospitalizations are reported to the state’s database, which allows state epidemiologists to identify patterns in cases and identify new outbreaks and trends of existing and emerging infectious diseases.
“We use a lot of our data sources to provide what you know as situational awareness. Where are we headed? What are the problems? What do we need to focus on?” He explained that this rich source of data allows the DPH to be alerted nearly in real-time of the start of flu season or for example a swatch of norovirus cases within a school.
However, Dr. Madoff also addressed the competing challenges of how to balance making data available for research while also maintaining sufficient patient privacy. Another barrier to easily sharing data he mentioned is the actual code and formatting of data between different electronic health record systems. Additionally, there can be commercial and legal barriers. He cited insurance companies as one such example. “Insurers have vast databases that they won’t necessarily share. The payer claims database that I was talking about in hospitalizations is a wealth of data around hospitalizations. But it’s not available for a year after the data are input because of regulatory and anti-competitive concerns and so forth. And so, we don’t have access to those data until it’s too late.”
Dr. Buckee acknowledged the current strengths and week points of various methods of data collection. She noted that mobile phone data still has a lot of room for improvement as a method of contact tracing, as seen in its use during the COVID-19 pandemic. Conversely, as an example a successful method of transmission tracing, Dr. Buckee cited pathogen genomics as is frequently used in determining local transmission versus imported transmission for pathogens such as malaria. This allows researchers to better understand patterns of transmission and recommend travel regulations based on that information.
A common theme echoed by all the speakers was the need for reassessing how academia, medicine, and public policy collaborate and what type of work is incentivized. Dr. Vespignani noted that during the COVID-19 pandemic, many turned to academics to answer questions of what was happening and how to proceed. However, many academics are not versed in how to put their findings into action. Dr. Buckee said she would like to see schools of public health put greater emphasis on teaching real-world practical skills for data collection and program implementation. This is how real, on-the-ground change will be achieved.
Expanding on the idea of training and preparing future public health practitioners, Dr. Vespignani asserts that within the next twenty years, we will see a shift across all fields that will require some understanding of and familiarity with data science and AI. Though not everyone will need to be an expert in computer science, he feels there will be a baseline level of understanding required. Dr. Buckee agreed with this while also acknowledging the challenge that as science and technology progresses, it’s unrealistic and unfeasible to expect all public health practitioners to personally be experts in every intersecting skill and discipline. Rather, she recommends creating organizational structures where this expertise in data science and analytics is an available resource within a team as a whole.
Along with diversifying research teams’ collective areas of expertise and training, Dr. Buckee discussed the need to include communities in their own research to achieve greater equity. This starts with the source of data collection.
“We have to say, ‘How can we use things like AI and smart data platforms at the community level?’ [Using] simple computers, like one laptop, one phone, offline. When we go [to Guyana] and talk to communities, they know exactly what’s going on. They know what species of malaria they’ve had. They know why they know who’s bringing it. They know what’s happening. It’s about trying to get the whole notion of data science and computation down at that lower community level so that we can start using that data more sensibly and informing policy from the bottom up rather than always thinking of it as a high-tech solution.”
Lastly, the speakers cautioned against using AI just for the sake of AI. They all agreed, there still needs to be a level of human oversight. This is where the programmers and data collectors need to be able to step in and examine potential for biases.