skip to main content


Title: AWSEM-Suite: a protein structure prediction server based on template-guided, coevolutionary-enhanced optimized folding landscapes
Abstract The accurate and reliable prediction of the 3D structures of proteins and their assemblies remains difficult even though the number of solved structures soars and prediction techniques improve. In this study, a free and open access web server, AWSEM-Suite, whose goal is to predict monomeric protein tertiary structures from sequence is described. The model underlying the server’s predictions is a coarse-grained protein force field which has its roots in neural network ideas that has been optimized using energy landscape theory. Employing physically motivated potentials and knowledge-based local structure biasing terms, the addition of homologous template and co-evolutionary restraints to AWSEM-Suite greatly improves the predictive power of pure AWSEM structure prediction. From the independent evaluation metrics released in the CASP13 experiment, AWSEM-Suite proves to be a reasonably accurate algorithm for free modeling, standing at the eighth position in the free modeling category of CASP13. The AWSEM-Suite server also features a front end with a user-friendly interface. The AWSEM-Suite server is a powerful tool for predicting monomeric protein tertiary structures that is most useful when a suitable structure template is not available. The AWSEM-Suite server is freely available at: https://awsem.rice.edu.  more » « less
Award ID(s):
2019745
NSF-PAR ID:
10233872
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
48
Issue:
W1
ISSN:
0305-1048
Page Range / eLocation ID:
W25 to W30
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available athttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

     
    more » « less
  2. null (Ed.)
    The phase problem in X-ray crystallography arises from the fact that only the intensities, and not the phases, of the diffracting electromagnetic waves are measured directly. Molecular replacement can often estimate the relative phases of reflections starting with those derived from a template structure, which is usually a previously solved structure of a similar protein. The key factor in the success of molecular replacement is finding a good template structure. When no good solved template exists, predicted structures based partially on templates can sometimes be used to generate models for molecular replacement, thereby extending the lower bound of structural and sequence similarity required for successful structure determination. Here, the effectiveness is examined of structures predicted by a state-of-the-art prediction algorithm, the Associative memory, Water-mediated, Structure and Energy Model Suite ( AWSEM-Suite ), which has been shown to perform well in predicting protein structures in CASP13 when there is no significant sequence similarity to a solved protein or only very low sequence similarity to known templates. The performance of AWSEM-Suite structures in molecular replacement is discussed and the results show that AWSEM-Suite performs well in providing useful phase information, often performing better than I-TASSER-MR and the previous algorithm AWSEM-Template . 
    more » « less
  3. Abstract

    Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available athttps://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3andhttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

     
    more » « less
  4. Abstract

    Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

     
    more » « less
  5. Abstract

    Small angle X‐ray scattering (SAXS) measures comprehensive distance information on a protein's structure, which can constrain and guide computational structure prediction algorithms. Here, we evaluate structure predictions of 11 monomeric and oligomeric proteins for which SAXS data were collected and provided to predictors in the 13th round of the Critical Assessment of protein Structure Prediction (CASP13). The category for SAXS‐assisted predictions made gains in certain areas for CASP13 compared to CASP12. Improvements included higher quality data with size exclusion chromatography‐SAXS (SEC‐SAXS) and better selection of targets and communication of results by CASP organizers. In several cases, we can track improvements in model accuracy with use of SAXS data. For hard multimeric targets where regular folding algorithms were unsuccessful, SAXS data helped predictors to build models better resembling the global shape of the target. For most models, however, no significant improvement in model accuracy at the domain level was registered from use of SAXS data, when rigorously comparing SAXS‐assisted models to the best regular server predictions. To promote future progress in this category, we identify successes, challenges, and opportunities for improved strategies in prediction, assessment, and communication of SAXS data to predictors. An important observation is that, for many targets, SAXS data were inconsistent with crystal structures, suggesting that these proteins adopt different conformation(s) in solution. This CASP13 result, if representative of PDB structures and future CASP targets, may have substantive implications for the structure training databases used for machine learning, CASP, and use of prediction models for biology.

     
    more » « less