AF: Small: Manifold optimization algorithms for protein-protein docking

Sponsor: National Science Foundation (NSF)

Award Number: 1527292

PI: Dmytro Kozakov

Co-Is/Co-PIs: Ioannis (Yannis) Ch. Paschalidis, Pirooz Vakili, Sandor Vajda

Abstract:

Proteins are the major building blocks of the cell. Many proteins perform their function by interacting with other proteins. In a typical cell hundreds of thousands of different protein interactions take place. Characterizing these interactions helps elucidate how living organisms function at the molecular level, contributes towards the development of treatments against diseases such as cancer and facilitates the design of novel bio-inspired materials. Detailed understanding of protein interaction mechanisms requires determining the three-dimensional structures of protein-protein complexes. These structures are very difficult to obtain using experimental techniques, thus, computational approaches can be very useful. The team of Kozakov, Paschalidis, Vajda and Vakili has developed algorithms and software that, according to the worldwide evaluation experiment CAPRI (Critical Assessment of Predicted Interactions), are among the best for predicting the structures of protein-protein complexes. These methods have been implemented in the fully automated docking server ClusPro, which is free for academic use, and has over 10,000 regular users. However, the current tools are computationally too demanding to serve such a large user base or to model protein interactions on a genomic scale. The goal of this project is to use rigorous geometrical and biophysical principles to substantially improve the efficiency of docking algorithms while retaining the accuracy of the generated models. Faster modeling of protein complexes will lead to better understanding of fundamental biological questions at both the cellular and system levels and will facilitate biochemical, biomedical, and biotechnology research. In addition, the methods will be used in training graduate students and teaching undergraduate and high school students.

The protein-docking problem is to computationally determine the 3-dimensional (3D) structure of the complex formed by two unbound proteins, given their individual 3D structures, by finding the global minimum of an energy-based scoring function. If one of the proteins, considered the receptor, is in fixed position and orientation, the search space includes the 6D rotational/translational space of the other protein, considered the ligand, as well as additional degrees of freedom that represent the flexibility of the two proteins. Since the energy function has a large number of local minima separated by high barriers, the minimization problem is extremely challenging. The proposed project aims to draw on the innovative manifold and optimization-based approaches that the group has developed for various components of docking protocols in order to (i) put the algorithms on a solid theoretical footing, (ii) study their performance and behavior more rigorously, and (iii) develop generalizable features of the algorithms that can be applied to other application domains. The work will focus on four key algorithms. First, the Fast Fourier Transform on Manifolds (MFFT) will be studied, which enables global systematic search in the space of rigid body motions of one protein with respect to the other. The computational complexity of MFFT will be analyzed and numerical performance tests for different bandwidth settings will be conducted. Second, underestimation-based sampling techniques will be explored, acting on the same manifold and targeting a medium-range search to refine MFFT solutions. A variety of underestimators will be developed and tested. Third, manifold-based local optimization approaches for flexible protein optimization will be developed. In each case, different manifold parameterizations and different minimization approaches will be compared. Finally, a new formulation of side-chain packing and other problems arising in protein docking will be developed using principles from the theory of Markov Random Fields. In this context, different solution approaches will be developed involving the Maximum Weight Independent Set (MWIS) problem and generalized belief propagation. The algorithms developed will be released as an open source software library to be used both for protein docking and in other application domains.

For more information, click here.