skip to main content


Title: Accurate ligand–protein docking in CASP15 using the ClusPro LigTBM server
Abstract

In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template‐based ligand docking program ClusPro ligTBM, also implemented as a public server available athttps://ligtbm.cluspro.org/. Since many targets had multiple chains and a number of ligands, several templates, and some manual interventions were required. In a few cases, no templates were found, and we had to use direct docking using the Glide program. Nevertheless, ligTBM was shown to be a very useful tool, and by any ranking criteria, our group was ranked among the top five best‐performing teams. In fact, all the best groups used template‐based docking methods. Thus, it appears that the AlphaFold2‐generated models, despite the high accuracy of the predicted backbone, have local differences from the x‐ray structure that make the use of direct docking methods more challenging. The results of CASP15 confirm that this limitation can be frequently overcome by homology‐based docking.

 
more » « less
NSF-PAR ID:
10475548
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Proteins: Structure, Function, and Bioinformatics
Volume:
91
Issue:
12
ISSN:
0887-3585
Format(s):
Medium: X Size: p. 1822-1828
Size(s):
p. 1822-1828
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Targets in the protein docking experiment CAPRI (Critical Assessment of Predicted Interactions) generally present new challenges and contribute to new developments in methodology. In rounds 38 to 45 of CAPRI, most targets could be effectively predicted using template‐based methods. However, the server ClusPro required structures rather than sequences as input, and hence we had to generate and dock homology models. The available templates also provided distance restraints that were directly used as input to the server. We show here that such an approach has some advantages. Free docking with template‐based restraints using ClusPro reproduced some interfaces suggested by weak or ambiguous templates while not reproducing others, resulting in correct server predicted models. More recently we developed the fully automated ClusPro TBM server that performs template‐based modeling and thus can use sequences rather than structures of component proteins as input. The performance of the server, freely available for noncommercial use athttps://tbm.cluspro.org, is demonstrated by predicting the protein‐protein targets of rounds 38 to 45 of CAPRI.

     
    more » « less
  2. Abstract

    Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available athttps://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3andhttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

     
    more » « less
  3. Abstract

    We report the results of the “UM‐TBM” and “Zheng” groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D‐I‐TASSER and DMFold‐Multimer algorithms, respectively. For monomer structure prediction, D‐I‐TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi‐source MSA searching and a structural modeling‐based MSA ranker; (ii) attention‐network based spatial restraints; (iii) a multi‐domain module containing domain partition and arrangement for domain‐level templates and spatial restraints; (iv) an optimized I‐TASSER‐based folding simulation system for full‐length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge‐based potentials. For 47 free modeling targets in CASP15, the final models predicted by D‐I‐TASSER showed average TM‐score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo‐based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end‐to‐end deep learning methods alone. For protein complex structure prediction, DMFold‐Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end‐to‐end modeling module from AlphaFold2‐Multimer. For the 38 complex targets, DMFold‐Multimer generated models with an average TM‐score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.

     
    more » « less
  4. Abstract

    As a participant in the joint CASP13‐CAPRI46 assessment, the ClusPro server debuted its new template‐based modeling functionality. The addition of this feature, called ClusPro TBM, was motivated by the previous CASP‐CAPRI assessments and by the proven ability of template‐based methods to produce higher‐quality models, provided templates are available. In prior assessments, ClusPro submissions consisted of models that were produced via free docking of pre‐generated homology models. This method was successful in terms of the number of acceptable predictions across targets; however, analysis of results showed that purely template‐based methods produced a substantially higher number of medium‐quality models for targets for which there were good templates available. The addition of template‐based modeling has expanded ClusPro's ability to produce higher accuracy predictions, primarily for homomeric but also for some heteromeric targets. Here we review the newest additions to the ClusPro web server and discuss examples of CASP‐CAPRI targets that continue to drive further development. We also describe ongoing work not yet implemented in the server. This includes the development of methods to improve template‐based models and the use of co‐evolutionary information for data‐assisted free docking.

     
    more » « less
  5. Abstract

    Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence‐based predicted contact or distance information is used. Contact‐assisted or distance‐assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query–template alignment. We present a new distance‐ and orientation‐based covariational threading method called DisCovER by effectively integrating information from inter‐residue distance and orientation along with the topological network neighborhood of a query–template alignment. Our method first selects a subset of templates using standard profile‐based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance‐ and orientation‐based query–template alignment using an iterative double dynamic programming framework. Multiple large‐scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state‐of‐the‐art threading approaches, and that the integration of the neighborhood effect with the inter‐residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available athttps://github.com/Bhattacharya-Lab/DisCovER.

     
    more » « less