A central challenge in template-free protein structure prediction is controlling the quality of computed tertiary structures also known as decoys. Given the size, dimensionality, and inherent characteristics of the protein structure space, this is non-trivial. The current mechanism employed by decoy generation algorithms relies on generating as many decoys as can be afforded. This is impractical and uninformed by any metrics of interest on a decoy dataset. In this paper, we propose to equip a decoy generation algorithm with an evolving map of the protein structure space. The map utilizes low-dimensional representations of protein structure and serves as a memory whose granularity can be controlled. Evaluations on diverse target sequences show that drastic reductions in storage do not sacrifice decoy quality, indicating the promise of the proposed mechanism for decoy generation algorithms in template-free protein structure prediction.
- PAR ID:
- 10164963
- Date Published:
- Journal Name:
- Molecules
- Volume:
- 25
- Issue:
- 9
- ISSN:
- 1420-3049
- Page Range / eLocation ID:
- 2228
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Estimating the accuracy of protein structural models is a critical task in protein bioinformatics. The need for robust methods in the estimation of protein model accuracy (EMA) is prevalent in the field of protein structure prediction, where computationally‐predicted structures need to be screened rapidly for the reliability of the positions predicted for each of their amino acid residues and their overall quality. Current methods proposed for EMA are either coupled tightly to existing protein structure prediction methods or evaluate protein structures without sufficiently leveraging the rich, geometric information available in such structures to guide accuracy estimation. In this work, we propose a geometric message passing neural network referred to as the geometry‐complete perceptron network for protein structure EMA (GCPNet‐EMA), where we demonstrate through rigorous computational benchmarks that GCPNet‐EMA's accuracy estimations are 47% faster and more than 10% (6%) more correlated with ground‐truth measures of per‐residue (per‐target) structural accuracy compared to baseline state‐of‐the‐art methods for tertiary (multimer) structure EMA including AlphaFold 2. The source code and data for GCPNet‐EMA are available on GitHub, and a public web server implementation is freely available.
-
Abstract Motivation Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery.
Results In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures.
Availability and implementation The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
-
Abstract Motivation Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods.
Results We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA.
Availability and implementation The source code is available at https://github.com/BioinfoMachineLearning/EnQA.
Supplementary information Supplementary data are available at Bioinformatics online.
-
SE3Lig: SE(3)-equivariant CNNs for the reconstruction of cofactors and ligands in protein structuresProtein structure prediction algorithms such as AlphaFold2 and ESMFold have dramatically increased the availability of high-quality models of protein structures. Because these algorithms predict only the structure of the protein itself, there is a growing need for methods that can rapidly screen protein structures for ligands. Previous work on similar tasks has shown promise but is lacking scope in the classes of atoms predicted and can benefit from the recent architectural developments in convolutional neural networks (CNNs). In this work, we introduce SE3Lig, a model for semantic in-painting of small molecules in protein structures. Specifically, we report SE(3)-equivariant CNNs trained to predict the atomic densities of common classes of cofactors (hemes, flavins, etc.) and the water molecules and inorganic ions in their vicinity. While the models are trained on high-resolution crystal structures of enzymes, they perform well on structures predicted by AlphaFold2, which suggests that the algorithm correctly represents cofactor-binding cavities.more » « less