Optimizing and Learning Strategies for Protein Docking

Sponsor: National Institute of Health (NIH)

Award Number: 5R01GM135930-02

PI: Pirooz Vakili

Co-Is/Co-PIs: Ioannis (Yannis) Ch. Paschalidis, Sandor Vajda

Abstract:

Protein docking is defined as predicting the three-dimensional structure of the docked complex based on knowledge of the structure of the components. Experimental techniques for this purpose are often expensive, time-consuming, and in some cases, not feasible; hence the need for computational docking methods. The problem of finding the docked conformation is generally formulated as a minimization of an energy-based scoring function. This function is composed of multiple energy terms that act in different space scales and demonstrate multi-frequency behavior leading to an enormous number of local minima. Furthermore, the process of docking/binding involves conformational changes to the component molecules leading to a highly complex search space for the optimization problem. These features render the optimization problem extremely difficult. Most state-of-the art docking protocols employ a multi-stage and multi-scale approach. They begin with a global search of the conformational space using a simplified scoring function to identify promising areas of the space, followed by local optimization using a more detailed and complete scoring function to remove clashes. In the final so-called refinement stage, promising areas found in the first two stages are explored further using a medium space-scale search to provide a set of final solutions. It has recently become evident that due to the inaccuracy of the scoring function/energy potentials, the optimization stage outlined above invariably generates a number of false positives at the final phase, namely1 conformations that have low score but are far from the native conformation. This motivates the introduction in this proposal of learning methods that combine energy with additional features in order to rank clusters of conformations at the refinement stage and improve final solutions. The proposal has two distinct thrusts: optimization and learning. On the optimization front, the project team in its past research has defined the docking problem as an optimization on manifolds. In this project, two novel elements in the manifold optimization formulation are introduced that are expected to lead to significant improvements in the performance of docking algorithms. On the learning front, using novel robust optimization techniques, a new and more rigorous approach to robust regression, classification, and outlier detection, is introduced in order to (i) obtain improved ranking of clusters in the refinement stage, and (ii) address the important problem of distinguishing between binders and non-binders. The project aims to improve the performance of computational docking used to predict whether, and if so how, proteins interact with each other and with small molecules. Understanding and predicting protein-protein and protein-small molecule interactions is an important component of the process of rational drug design. More effective protein docking algorithms, therefore, is expected to lead to improving the rational drug design process.

For more information, click here.