Large Language Models Advance Healthcare and Public Health

By Chloe de Leon

Artificial intelligence (AI) usage exploded in the past three years, with the public release of several large language models (LLMs). As this technology gained public traction, researchers found ways to apply it in healthcare and medicine, transforming the efficiency and outcomes of current systems. Distinguished Professor of Engineering and Director of the Rafik B. Hariri Institute for Computing and Computational Science & Engineering, Yannis Paschalidis (ECE, BME, SE), leads such developments by applying AI models to healthcare and public health. 

Paschalidis and his students work with both more traditional statistical models and neural networks, a type of computational model that consist of many neurons (like the human brain) whose function can be adjusted by setting neuronal weights using large datasets. These models process data and represent complex relationships, mapping the input to some decision, e.g., classifying a patient’s health state. 

Most of their current projects use LLMs, a type of neural network, trained to process and generate text in natural language. LLMs are used in popular tools like ChatGPT, Gemini, Copilot, and Claude.   

“If you go back five, six years ago, before the era of LLMs, people were developing NLP pipelines to process text from electronic health records, which was tedious as one had to build a new pipeline for each use case. LLMs have simplified and greatly improved such development,” Paschalidis said.  

Development of LLMs expands beyond just human, or English, language, according to Paschalidis. He explained that LLM use in computational biology might rely on models trained in protein sequences (the sequence of amino acids defining a protein). 

“People think about language models as human language, right?” Paschalidis said. “That’s not necessarily the only application.” 

Paschalidis’s group works with LLMs and other machine learning models and further trains them with distributionally robust methods. Such methods are better equipped to handle distributional data shifts and optimize worst-case-scenario disturbances in data. 

“The data in real practice that the model is going to be applied to may be somewhat different than the data that you used in order to train the model,” Paschalidis said.  

The final steps of developing the models vary from project to project. New methodologies must be developed as each application of AI presents different challenges, according to Paschalidis. 

One common type of fine-tuning applied to LLMs is using human feedback on the LLM output and a technique called reinforcement learning (RL).  

“[RL from Human Feedback is] the part of the training where you try to align what the model is producing as an output with what a human would like or would prefer a model to produce,” Paschalidis said.  

Projects 

Paschalidis and his students’ LLM development and fine-tuning applies to several topics. Paschalidis described four main areas of interest: Antibody and antigen sequences, medical imaging, recognizing disease events, and Alzheimer’s.  

One project uses an LLM, trained in protein language, to predict how antibodies and antigens might bind.  

“[It is] very useful in many different domains [like] drug development [and] in developing antibodies that can target specific pathways of disease,” Paschalidis said. 

While this research is not getting to the stage of drug development yet, LLMs in this application will help address the variability that exists in protein sequences. The ability to still perform with high levels of variability is also applicable in computer vision model development. Paschalidis explained that vision models have applications in accelerated magnetic resonance imaging (MRI) scans of the brain, in terms of stroke assessment and response.  

These quick MRI scans often contain far more noise, making signs of stroke less apparent. Vision models can be developed specifically for these kinds of projects and identify strokes during the prime intervention period, according to Paschalidis.  

“That helps with triage because in stroke, there’s sort of this golden hour to intervene,” Paschalidis said. “You have much worse outcomes if you don’t intervene within that time period.” 

Training LLMs has also proved useful in public health, with the development of the Biothreats Emergence, Analysis and Communications Network (BEACON) – a new disease outbreak reporting system launched by the BU Center for Emerging Infectious Diseases.    

“It looks for any report about…any disease outbreak that is happening anywhere in the world,” Paschalidis said.  

Any report discovered is processed by a custom LLM trained by Paschalidis and his group. Infectious disease experts review LLM-generated reports on disease events before they are published on the website. Paschalidis is now exploring additional ways to integrate AI into BEACON. 

Paschalidis’s group also uses LLMs to aid in diagnosis and predict conditions such as Alzheimer’s disease. His team trained their model with thousands of recorded neurological exams from the Framingham Heart Study. 

“We use speech recognition to obtain a text transcript of what happened in this exam, and then using large language models we extract features from that text,” Paschalidis said. 

The models can recognize levels of cognitive impairment associated with aging and predict the progression of cognitive decline over a six-year period. Access to large data sets has allowed them to look at other applications like predicting pregnancy and IVF treatment outcomes.  

With LLMs and other AI models making it easier to process human text, their potential applications continue to grow.  

“We communicate as humans by speaking, and we communicate by writing,” Paschalidis said. “Having models that are able to process and understand that is very, very important.”