Abstract Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available athttps://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3andhttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
more »
« less
Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14
Abstract The trRosetta structure prediction method employs deep learning to generate predicted residue‐residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template‐free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high‐resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter‐domain or inter‐chain contacts.
more »
« less
- Award ID(s):
- 1937533
- PAR ID:
- 10365368
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Proteins: Structure, Function, and Bioinformatics
- Volume:
- 89
- Issue:
- 12
- ISSN:
- 0887-3585
- Page Range / eLocation ID:
- p. 1722-1733
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence‐based predicted contact or distance information is used. Contact‐assisted or distance‐assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query–template alignment. We present a new distance‐ and orientation‐based covariational threading method called DisCovER by effectively integrating information from inter‐residue distance and orientation along with the topological network neighborhood of a query–template alignment. Our method first selects a subset of templates using standard profile‐based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance‐ and orientation‐based query–template alignment using an iterative double dynamic programming framework. Multiple large‐scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state‐of‐the‐art threading approaches, and that the integration of the neighborhood effect with the inter‐residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available athttps://github.com/Bhattacharya-Lab/DisCovER.more » « less
-
Abstract As a participant in the joint CASP13‐CAPRI46 assessment, the ClusPro server debuted its new template‐based modeling functionality. The addition of this feature, called ClusPro TBM, was motivated by the previous CASP‐CAPRI assessments and by the proven ability of template‐based methods to produce higher‐quality models, provided templates are available. In prior assessments, ClusPro submissions consisted of models that were produced via free docking of pre‐generated homology models. This method was successful in terms of the number of acceptable predictions across targets; however, analysis of results showed that purely template‐based methods produced a substantially higher number of medium‐quality models for targets for which there were good templates available. The addition of template‐based modeling has expanded ClusPro's ability to produce higher accuracy predictions, primarily for homomeric but also for some heteromeric targets. Here we review the newest additions to the ClusPro web server and discuss examples of CASP‐CAPRI targets that continue to drive further development. We also describe ongoing work not yet implemented in the server. This includes the development of methods to improve template‐based models and the use of co‐evolutionary information for data‐assisted free docking.more » « less
-
Abstract Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available athttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.more » « less
-
null (Ed.)Abstract The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.more » « less
An official website of the United States government
