skip to main content

Title: DISTEVAL: a web server for evaluating predicted protein distances
Abstract Background Protein inter-residue contact and distance prediction are two key intermediate steps essential to accurate protein structure prediction. Distance prediction comes in two forms: real-valued distances and ‘binned’ distograms, which are a more finely grained variant of the binary contact prediction problem. The latter has been introduced as a new challenge in the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14) 2020 experiment. Despite the recent proliferation of methods for predicting distances, few methods exist for evaluating these predictions. Currently only numerical metrics, which evaluate the entire prediction at once, are used. These give no insight into the structural details of a prediction. For this reason, new methods and tools are needed. Results We have developed a web server for evaluating predicted inter-residue distances. Our server, DISTEVAL, accepts predicted contacts, distances, and a true structure as optional inputs to generate informative heatmaps, chord diagrams, and 3D models. All of these outputs facilitate visual and qualitative assessment. The server also evaluates predictions using other metrics such as mean absolute error, root mean squared error, and contact precision. Conclusions The visualizations generated by DISTEVAL complement each other and collectively serve as a powerful tool for both quantitative and qualitative assessments of predicted contacts and distances, even in the absence of a true 3D structure.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
BMC Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Background Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. Results To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist’s real-value distance prediction is 0.896 Å 2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å 2 of DeepDist’s multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist’s multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. Conclusions DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone. 
    more » « less
  2. Martelli, Pier Luigi (Ed.)
    Abstract Motivation Accurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions (i.e. classifying distances between two residues into two categories: in contact (< 8 Angstrom) and not in contact otherwise) and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. Availability The software package, source code, and data of DeepDist2 are freely available at and 
    more » « less
  3. Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at 
    more » « less
  4. null (Ed.)
    Abstract Protein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling. 
    more » « less
  5. Approaches to in silico prediction of protein structures have been revolutionized by AlphaFold2, while those to predict interfaces between proteins are relatively underdeveloped, owing to the overly complicated yet relatively limited data of protein–protein complexes. In short, proteins are 1D sequences of amino acids folding into 3D structures, and interact to form assemblies to function. We believe that such intricate scenarios are better modeled with additional indicative information that reflects their multi-modality nature and multi-scale functionality. To improve binary prediction of inter-protein residue-residue contacts, we propose to augment input features with multi-modal representations and to synergize the objective with auxiliary predictive tasks. (i) We first progressively add three protein modalities into models: protein sequences, sequences with evolutionary information, and structure-aware intra-protein residue contact maps. We observe that utilizing all data modalities delivers the best prediction precision. Analysis reveals that evolutionary and structural information benefit predictions on the difficult and rigid protein complexes, respectively, assessed by the resemblance to native residue contacts in bound complex structures. (ii) We next introduce three auxiliary tasks via self-supervised pre-training (binary prediction of protein-protein interaction (PPI)) and multi-task learning (prediction of inter-protein residue–residue distances and angles). Although PPI prediction is reported to benefit from predicting intercontacts (as causal interpretations), it is not found vice versa in our study. Similarly, the finer-grained distance and angle predictions did not appear to uniformly improve contact prediction either. This again reflects the high complexity of protein–protein complex data, for which designing and incorporating synergistic auxiliary tasks remains challenging. 
    more » « less