skip to main content


Title: Predicting protein conformational changes for unbound and homology docking: learning from intrinsic and induced flexibility
ABSTRACT

Predicting protein conformational changes from unbound structures or even homology models to bound structures remains a critical challenge for protein docking. Here we present a study directly addressing the challenge by reducing the dimensionality and narrowing the range of the corresponding conformational space. The study builds on cNMA—our new framework of partner‐ and contact‐specific normal mode analysis that exploits encounter complexes and considers both intrinsic and induced flexibility. First, we established over a CAPRI (Critical Assessment of PRedicted Interactions) target set that the direction of conformational changes from unbound structures and homology models can be reproduced to a great extent by a small set of cNMA modes. In particular, homology‐to‐bound interface root‐mean‐square deviation (iRMSD) can be reduced by 40% on average with the slowest 30 modes. Second, we developed novel and interpretable features from cNMA and used various machine learning approaches to predict the extent of conformational changes. The models learned from a set of unbound‐to‐bound conformational changes could predict the actual extent of iRMSD with errors around 0.6 Å for unbound proteins in a held‐out benchmark subset, around 0.8 Å for unbound proteins in the CAPRI set, and around 1 Å even for homology models in the CAPRI set. Our results shed new insights into origins of conformational differences between homology models and bound structures and provide new support for the low‐dimensionality of conformational adjustment during protein associations. The results also provide new tools for ensemble generation and conformational sampling in unbound and homology docking. Proteins 2017; 85:544–556. © 2016 Wiley Periodicals, Inc.

 
more » « less
NSF-PAR ID:
10032152
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Proteins: Structure, Function, and Bioinformatics
Volume:
85
Issue:
3
ISSN:
0887-3585
Page Range / eLocation ID:
p. 544-556
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT

    We report the performance of protein–protein docking predictions by our group for recent rounds of the Critical Assessment of Prediction of Interactions (CAPRI), a community‐wide assessment of state‐of‐the‐art docking methods. Our prediction procedure uses a protein–protein docking program named LZerD developed in our group. LZerD represents a protein surface with 3D Zernike descriptors (3DZD), which are based on a mathematical series expansion of a 3D function. The appropriate soft representation of protein surface with 3DZD makes the method more tolerant to conformational change of proteins upon docking, which adds an advantage for unbound docking. Docking was guided by interface residue prediction performed with BindML and cons‐PPISP as well as literature information when available. The generated docking models were ranked by a combination of scoring functions, including PRESCO, which evaluates the native‐likeness of residues' spatial environments in structure models. First, we discuss the overall performance of our group in the CAPRI prediction rounds and investigate the reasons for unsuccessful cases. Then, we examine the performance of several knowledge‐based scoring functions and their combinations for ranking docking models. It was found that the quality of a pool of docking models generated by LZerD, that is whether or not the pool includes near‐native models, can be predicted by the correlation of multiple scores. Although the current analysis used docking models generated by LZerD, findings on scoring functions are expected to be universally applicable to other docking methods. Proteins 2017; 85:513–527. © 2016 Wiley Periodicals, Inc.

     
    more » « less
  2. Abstract Motivation

    Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue–residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue–residue contacts in homodimers from residue–residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue–residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features.

    Results

    Tested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset and CASP-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.10% and 33.50% respectively at 6 Å contact threshold, which is substantially better than DeepHomo and DNCON2_inter and similar to Glinter. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs well, even though its accuracy is lower than using true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers.

    Availability and implementation

    The source code of DRCon is available at https://github.com/jianlin-cheng/DRCon. The datasets are available at https://zenodo.org/record/5998532#.YgF70vXMKsB.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract

    Computational modeling of protein–DNA complex structures has important implications in biomedical applications such as structure‐based, computer aided drug design. A key step in developing methods for accurate modeling of protein–DNA complexes is similarity assessment between models and their reference complex structures. Existing methods primarily rely on distance‐based metrics and generally do not consider important functional features of the complexes, such as interface hydrogen bonds that are critical to specific protein–DNA interactions. Here, we present a new scoring function, ComparePD, which takes interface hydrogen bond energy and strength into account besides the distance‐based metrics for accurate similarity measure of protein–DNA complexes. ComparePD was tested on two datasets of computational models of protein–DNA complexes generated using docking (classified as easy, intermediate, and difficult cases) and homology modeling methods. The results were compared with PDDockQ, a modified version of DockQ tailored for protein–DNA complexes, as well as the metrics employed by the community‐wide experiment CAPRI (Critical Assessment of PRedicted Interactions). We demonstrated that ComparePD provides an improved similarity measure over PDDockQ and the CAPRI classification method by considering both conformational similarity and functional importance of the complex interface. ComparePD identified more meaningful models as compared to PDDockQ for all the cases having different top models between ComparePD and PDDockQ except for one intermediate docking case.

     
    more » « less
  4. Abstract

    Targets in the protein docking experiment CAPRI (Critical Assessment of Predicted Interactions) generally present new challenges and contribute to new developments in methodology. In rounds 38 to 45 of CAPRI, most targets could be effectively predicted using template‐based methods. However, the server ClusPro required structures rather than sequences as input, and hence we had to generate and dock homology models. The available templates also provided distance restraints that were directly used as input to the server. We show here that such an approach has some advantages. Free docking with template‐based restraints using ClusPro reproduced some interfaces suggested by weak or ambiguous templates while not reproducing others, resulting in correct server predicted models. More recently we developed the fully automated ClusPro TBM server that performs template‐based modeling and thus can use sequences rather than structures of component proteins as input. The performance of the server, freely available for noncommercial use athttps://tbm.cluspro.org, is demonstrated by predicting the protein‐protein targets of rounds 38 to 45 of CAPRI.

     
    more » « less
  5. ABSTRACT

    The heavily used protein–protein docking server ClusPro performs three computational steps as follows: (1) rigid body docking, (2) RMSD based clustering of the 1000 lowest energy structures, and (3) the removal of steric clashes by energy minimization. In response to challenges encountered in recent CAPRI targets, we added three new options to ClusPro. These are (1) accounting for small angle X‐ray scattering data in docking; (2) considering pairwise interaction data as restraints; and (3) enabling discrimination between biological and crystallographic dimers. In addition, we have developed an extremely fast docking algorithm based on 5D rotational manifold FFT, and an algorithm for docking flexible peptides that include known sequence motifs. We feel that these developments will further improve the utility of ClusPro. However, CAPRI emphasized several shortcomings of the current server, including the problem of selecting the right energy parameters among the five options provided, and the problem of selecting the best models among the 10 generated for each parameter set. In addition, results convinced us that further development is needed for docking homology models. Finally, we discuss the difficulties we have encountered when attempting to develop a refinement algorithm that would be computationally efficient enough for inclusion in a heavily used server. Proteins 2017; 85:435–444. © 2016 Wiley Periodicals, Inc.

     
    more » « less