Green lab’s new RNA design method combines novel neural networks

By Jennifer Rosenberg

Ribonucleic acid, also called RNA, is a molecule present in all living cells. It plays a critical role in transmitting genetic instructions from DNA and creating proteins. With the power to execute a plethora of functions, the little RNA “messenger” has led to important innovations across therapeutics, diagnostics, and vaccines, and made us rethink our understanding of life itself.

A team of researchers from Boston University’s Biological Design Center and the Department of Biomedical Engineering recently made significant steps forward in the development of the next generation of computational RNA tools. They published a study in Nature Communications that describes a novel, AI-assisted technique for designing different types of RNA molecules with improved function. Much like a large language model that can be used to compose entirely new texts, this model can compose new RNA sequences tailored for specific tasks in the cell or in a diagnostic assay. Their research has shown it is possible to predict and generate RNA sequences that have specific functions across a broad array of potential applications.

In this Q&A, Associate Professor Alex Green (BME), discusses the power of two new tools—sequence and structure oRNA molecules (SANDSTORM) and Generative Adversarial RNA Design Networks (GARDN)—in developing diagnostic and therapeutic RNAs with improved functions.

Why are you interested in engineering RNA with novel functions?

RNA engineering has tremendous potential because RNA is the only system in the body that both encodes genetic information and has diverse functional activity across a broad array of cellular processes. Engineered RNAs have applications in gene editing for research or therapeutic purposes, diagnostics, and synthetic biological systems. Additionally, RNA engineering could provide a simpler and more streamlined approach to ensure cells produce the right amount of protein at the right time, which has tremendous potential in enabling novel protein-based therapeutics and targeted therapies. The Green lab focuses on engineering RNA in a programmable way so it can function as a device to achieve a specific end goal. For instance, we work on designing RNAs that are active in cancer cells to produce a therapy, but are completely silent if they encounter a healthy cell, which minimizes unwanted side effects.

What have been the challenges to date in realizing the potential of RNA engineering?

Alex Green
Alex Green (BME). Photo by Cydney Scott

A key challenge has been the need to synthesize and screen RNA in experimental systems, which can be both time consuming and cost/resource intensive. Several computational tools have been developed to overcome the challenges of experimental approaches, but they all use different coding platforms and architectures which makes it very difficult to integrate them. Additionally, most of these existing methods have been designed to predict the function of specific types of RNAs, which means there isn’t a single tool that can be broadly applied to answer all the questions and make all the predictions we’d like to make.

How do SANDSTORM and GARDN overcome these challenges?

SANDSTORM is a deep ML approach that incorporates information about RNA sequence and RNA secondary structure to predict the function of diverse classes of RNAs. We can use SANDSTORM neural networks — which learn and improve over time as we get more data — to predict the functional activity of the ends of RNA molecules (which play important roles in RNA stability, trafficking, and translation), parts of the RNA that interact with ribosomes, and RNAs used in CRISPR diagnostics. GARDN is a generative adversarial network architecture — which is a system tasked with generating realistic examples of functional RNA and discriminating between realistic and unrealistic examples.

When we combine SANDSTORM with GARDN, we have a powerful system that can generate and select RNA sequences that provide desired functions while also being highly computationally efficient. Fewer parameters are needed for training and prediction compared with other computational approaches which makes the system faster and related workflows easier. A graduate student can use this system on their personal computer, as Aidan Riley, a PhD student in my lab who championed the work, can attest. In fact, Aidan was a driver for this project because he wanted to use ML to screen and test engineered RNA faster and more efficiently, and his background in ML played a critical role in the ultimate design of the system.

What are the next steps in this research?

To date we have demonstrated the utility of SANDSTORM and GARDN in engineering the ends of RNA molecules (known as the 5-prime and 3-prime ends). Part of our current focus is on designing the coding region between the two ends. Toward this end, we are merging these computational tools with other developments at BU, including self-amplifying RNA technology and more specific delivery of therapeutics. Self-amplifying RNA technology is a more efficient way to generate the RNAs we want to evaluate in experimental systems. Further down the road we also want to engineer RNA to improve the efficiency of protein production, which could have important implications in enabling new targeted therapies and making production of therapeutic proteins more efficient.

In what ways might this research help further the development of vaccines or drug therapies, or in the treating of diseases?

This research can help make vaccines and therapies smarter and more effective. Using our models we are finding that we can significantly increase the amount payload protein that an mRNA generates. This is valuable since it can lead to more potent treatments or it can be used to reduce the amount of RNA that is required for the therapy, lowering costs and potentially reducing side effects. We also use the models to design RNAs that only activate in response to different biomarkers that they detect. This capability could be used to develop therapies that only activate in specific tissues or in tumor cells only, for example.

How could biotech companies potentially benefit from the knowledge gained by this research?

Our technology could be used to help biotech companies develop RNA drug candidates much faster and more efficiently. Training new machine learning models often requires testing tens of thousands of different candidates, which is expensive and time consuming. With SANDSTORM, we’ve found that we can generate effective models from only a few hundred test sequences, which is considerably smaller than I thought would be possible. For small biotech companies in particular, this cost and time benefit could be decisive in getting a new drug to market.


This work was supported by startup funds from Boston University; Defense Advanced ResearchProjects Agency (DARPA), National Institutes of Health (NIH) and the National Science Foundation (NSF) Graduate Research Fellowship.