Antibody Modeling and Epitope Prediction

Dmitri Beglov, Sandor Vajda, and Dima Kozakov

Supported by:
NIH R43 GM134769 Software For Antibody Epitope Prediction (PI: Dmitri Beglov)

Accurate epitope prediction is important for the development of antibody-based therapies. When multiple new antibodies are discovered against the whole antigen, their epitopes and, therefore, potential novelty and mechanism of action are usually unknown. Site-directed mutagenesis, the routine method for epitope mapping, requires testing a large number of mutants since any part of the antigen can potentially form an epitope. The goal of this new project is developing methodology and software for the accurate computational prediction of discontinuous B-cell epitopes based on the structure of an antigen and the structure or sequence of an antibody. Our starting point is PIPER, a protein-protein docking program, which is the docking engine in the public server ClusPro. PIPER has a special option for antibody-antigen docking, and has been used for epitope prediction by several groups. However, in its present form the software generally results in a high number of putative epitopes, and more accurate prediction requires substantial experimental efforts, e.g., by site-directed mutagenesis. We will modify PIPER to maximize the information available from the docking by generating a large ensemble of low energy docked structures and calculating a contact map rather than discrete docked structures. The number of potential epitopes will be further reduced by a template-based approach based on vector contact maps to characterize antibody-antigen interfaces (Aim 1). We also explore predicting the epitope based on models of the CDR regions (Aim 2). Generating large ensembles of docked structures with a large variety of CDR conformations will reduce the sensitivity of the method to inevitable modeling and docking uncertainty. By increasing the reliability of the predicted epitopes we expect to reduce or even to eliminate the need for mutagenesis experiments. Finally we will develop a machine-learning algorithm for the mapping of amino acid composition of CDR regions into epitope composition, a method that can be used when only the antibody sequence available and structure prediction is uncertain due to the lack of suitable templates (Aim 3).

Aim 1: Developing a contact analysis and combined probability scheme to improve epitope prediction. While docking methods may generate a very diverse set of docked structures, they usually define only a few well-populated interface regions. Surprisingly this property has not been exploited for epitope prediction. PIPER will be modified to generate a large ensemble of docked structures in order to determine how frequently an antigen residue makes contacts with the antibody. Previous studies in predicting protein-protein interfaces and our preliminary data indicate that this approach substantially improves prediction accuracy. The accuracy will be further improved by the redesign of the structure-based antibody-antigen interaction potential used in PIPER.

Aim 2: Modeling of the CDR antibody regions to improve epitope scoring. We will explore influence of conformational changes in CDR regions of antibodies on antibody-antigen docking and epitope prediction. We will also test whether docking of multiple structures of the antibody with different conformations of CDR loops can improve ranking of the native epitope in the ensemble of possible solutions. After grafting statistically derived backbone templates, we will restore the true amino acid sequence by the placement of most probable rotamers with subsequent minimization. We will apply docking and the contact analysis techniques developed in Aim 1 to a large collection of multiple initial models of the antibody in order to optimize epitope ranking.

Aim 3: Epitope prediction for known antigen structures and antibody sequences. In this aim we address a common situation when the structure of antigen is known, but the antibody is only specified by its sequence. Based on the known dataset of structures of antibody-antigen complexes we will develop a machine-learning algorithm for mapping the amino acid composition of antibody CDR regions into epitope composition. An artificial neural net will be trained to do the predictions. We will then examine the entire antigen surface for local composition matches to the neural net predictions. The matching will result in a probability score for possible epitopes. When the antibody structure is known, the sequence based probability score will be combined with the docking based probability measures described in Specific Aim 1.