skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer
Abstract Motivation

The state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph.

Results

The method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score—the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement.

Availability and implementation

The source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.

 
more » « less
Award ID(s):
1759934
PAR ID:
10414151
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
5
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery.

    Results

    In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures.

    Availability and implementation

    The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.

     
    more » « less
  2. Abstract Motivation

    Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods.

    Results

    We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA.

    Availability and implementation

    The source code is available at https://github.com/BioinfoMachineLearning/EnQA.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation

    Accurate modeling of protein–protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations.

    Results

    Here, we present PIQLE, a deep graph learning method for protein–protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of individual interactions between the interfacial residues using a multi-head graph attention network and then probabilistically combines the estimated quality for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods including DProQA, TRScore, GNN-DOVE and DOVE on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study and comparison with the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring reveal that the performance gains are connected to the effectiveness of the multi-head graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE.

    Availability and implementation

    An open-source software implementation of PIQLE is freely available at https://github.com/Bhattacharya-Lab/PIQLE.

    Supplementary information

    Supplementary data are available at Bioinformatics Advances online.

     
    more » « less
  4. Abstract

    Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available athttps://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3andhttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

     
    more » « less
  5. Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively.

     
    more » « less