Mapping Molecular Pathways
Boris Shakhnovich: Predicting the Roles of Proteins
The recent completion of the mapping of the human genome has uncovered the genetic sequence for the approximately 30,000 human genes, but scientists are largely in the dark concerning the role of those genes and the proteins they encode in development, physiology, and disease. One way to decipher the biologically relevant meaning of the human genome’s code of letters is to change, or mutate, each sequence in the laboratory and examine the physical or physiological changes in a model organism. Bioinformaticist Boris Shakhnovich has developed an easier way: exploiting the power of computer modeling to narrow the potential roles of a protein before beginning lab experiments on the genes themselves.
“If someone sequences a new genome or finds a pet new gene, they are, of course, interested in what it does,” explains Shakhnovich. “The easiest thing to do is compare the new sequence to other genes. But there’s inherent ambiguity because even the function of a protein’s closest relative is not exactly a match.” The sequence of a gene can be used to predict the secondary structure, or building blocks, of the protein it encodes. But protein function is tied to how those blocks are put together—its three-dimensional shape, or tertiary structure, which is quite hard to predict from sequence alone. Predicting a protein’s function is further complicated by the fact that structure and function have a many-to-many relationship, meaning that there are many functions that can result from one structure and many structures that can give rise to the same function.
According to Charles DeLisi, Arthur G. B. Metcalf Professor of Science and Engineering, one of the “founding fathers” of the Human Genome Project and Shakhnovich’s mentor, “Although researchers had previously developed ways to approximate relationships between structure and function, Shakhnovich significantly improved on previous work by creating a mathematical method that is able to correlate protein structure and function with a high degree of accuracy.”
Shakhnovich began with a previously established ontology, a structured, controlled vocabulary that maps gene products in terms of their molecular function, such as DNA-binding or ATP-producing. By measuring the “distance” between two proteins on the ontology, he could then assign a score of functional similarity between protein structures.
“This was the first instance of distance in functional space,” says Shakhnovich, “but we wondered if it corresponded to known relationships between structure and function.” After comparing his measure with the structural distance, he saw a tight correlation that let him know he was on the right track. Shakhnovich then compared his functional
distances to a measure of phylogenetic distance, a gauge of similar distributions on an evolutionary tree, and again observed a correlation. By mapping structural, phylogenetic, and functional distances on one functional landscape, he realized that function could best be predicted by considering structural distance in a genomic context.
The goal of the work is to narrow down the range of possible functions of a particular protein to a few possibilities, thus making it easier to design experiments that predict the protein’s function exactly. “Given the new definition of functional distance, we can say that the protein function is a certain distance away from known functions, with one of those nearby functions being a perfect match.” Shakhnovich gives an analogous example: “If the only thing you know is that a structure has four legs, it could be a chair or a table or an elephant.” You then can test to see which of the four-legged possibilities is accurate. “My hope is to redefine the notion of how we predict functions,” he adds. The goal is not to hit it exactly on the mark, but to predict the range of options accurately so that researchers can quickly test a gene’s function in the laboratory and more quickly understand how it functions in the organism.
This research was featured in the June 2005 issue of the journal Public Library of Science Computational Biology.
For more information, visit http://romi.bu.edu/research/overview.htm and http://cagt.bu.edu/.
—by Leah Eisenstadt