Abstract MotivationProteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. ResultsIn this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. Availability and implementationThe source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
more »
« less
In the Age of Machine Learning Cryo‐EM Research is Still Necessary: A Path toward Precision Medicine
Abstract Machine learning has proven useful in analyzing complex biological data and has greatly influenced the course of research in structural biology and precision medicine. Deep neural network models oftentimes fail to predict the structure of complex proteins and are heavily dependent on experimentally determined structures for their training and validation. Single‐particle cryogenic electron microscopy (cryoEM) is also advancing the understanding of biology and will be needed to complement these models by continuously supplying high‐quality experimentally validated structures for improvements in prediction quality. In this perspective, the significance of structure prediction methods is highlighted, but the authors also ask, what if these programs cannot accurately predict a protein structure important for preventing disease? The role of cryoEM is discussed to help fill the gaps left by artificial intelligence predictive models in resolving targetable proteins and protein complexes that will pave the way for personalized therapeutics.
more »
« less
- Award ID(s):
- 1764201
- PAR ID:
- 10524831
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Advanced Biology
- Volume:
- 7
- Issue:
- 8
- ISSN:
- 2701-0198
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS‐CoV‐2 genome. Forty‐seven research groups submitted over 3000 three‐dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure‐based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).more » « less
-
Abstract MotivationQuality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. ResultsWe develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. Availability and implementationThe source code is available at https://github.com/BioinfoMachineLearning/EnQA. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract BackgroundNetwork propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. ResultsWe design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. ConclusionsWe examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.more » « less
-
Abstract MotivationThe state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph. ResultsThe method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score—the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement. Availability and implementationThe source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.more » « less
An official website of the United States government

