skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SE3Lig: SE(3)-equivariant CNNs for the reconstruction of cofactors and ligands in protein structures
Protein structure prediction algorithms such as AlphaFold2 and ESMFold have dramatically increased the availability of high-quality models of protein structures. Because these algorithms predict only the structure of the protein itself, there is a growing need for methods that can rapidly screen protein structures for ligands. Previous work on similar tasks has shown promise but is lacking scope in the classes of atoms predicted and can benefit from the recent architectural developments in convolutional neural networks (CNNs). In this work, we introduce SE3Lig, a model for semantic in-painting of small molecules in protein structures. Specifically, we report SE(3)-equivariant CNNs trained to predict the atomic densities of common classes of cofactors (hemes, flavins, etc.) and the water molecules and inorganic ions in their vicinity. While the models are trained on high-resolution crystal structures of enzymes, they perform well on structures predicted by AlphaFold2, which suggests that the algorithm correctly represents cofactor-binding cavities.  more » « less
Award ID(s):
2226816
PAR ID:
10490039
Author(s) / Creator(s):
Publisher / Repository:
NeurIPS 2023
Date Published:
Journal Name:
Proceedings of NeurIPS "Machine Learning in Structural Biology” workshop
Format(s):
Medium: X
Location:
New Orleans
Sponsoring Org:
National Science Foundation
More Like this
  1. ResNet and, more recently, AlphaFold2 have demonstrated that deep neural networks can now predict a tertiary structure of a given protein amino-acid sequence with high accuracy. This seminal development will allow molecular biology researchers to advance various studies linking sequence, structure, and function. Many studies will undoubtedly focus on the impact of sequence mutations on stability, fold, and function. In this paper, we evaluate the ability of AlphaFold2 to predict accurate tertiary structures of wildtype and mutated sequences of protein molecules. We do so on a benchmark dataset in mutation modeling studies. Our empirical evaluation utilizes global and local structure analyses and yields several interesting observations. It shows, for instance, that AlphaFold2 performs similarly on wildtype and variant sequences. The placement of the main chain of a protein molecule is highly accurate. However, while AlphaFold2 reports similar confidence in its predictions over wildtype and variant sequences, its performance on placements of the side chains suffers in comparison to main-chain predictions. The analysis overall supports the premise that AlphaFold2-predicted structures can be utilized in further downstream tasks, but that further refinement of these structures may be necessary. 
    more » « less
  2. Abstract AlphaFold2 has revolutionized protein structure prediction from amino‐acid sequence. In addition to protein structures, high‐resolution dynamics information about various protein regions is important for understanding protein function. Although AlphaFold2 has neither been designed nor trained to predict protein dynamics, it is shown here how the information returned by AlphaFold2 can be used to predict dynamic protein regions at the individual residue level. The approach, which is termed cdsAF2, uses the 3D protein structure returned by AlphaFold2 to predict backbone NMR NHS2order parameters using a local contact model that takes into account the contacts made by each peptide plane along the backbone with its environment. By combining for each residue AlphaFold2's pLDDT confidence score for the structure prediction accuracy with the predictedS2value using the local contact model, an estimator is obtained that semi‐quantitatively captures many of the dynamics features observed in experimental backbone NMR NHS2order parameter profiles. The method is demonstrated for a set nine proteins of different sizes and variable amounts of dynamics and disorder. 
    more » « less
  3. Artificial intelligence-driven advances in protein structure prediction in recent years have raised the question: has the protein structure-prediction problem been solved? Here, with a focus on nonglobular proteins, we highlight the many strengths and potential weaknesses of DeepMind’s AlphaFold2 in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AlphaFold2 model quality and reliability using the predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AlphaFold2 can be applied to and the caveats involved. Concrete examples of how AlphaFold2 models can be integrated with experimental data in the form of small-angle X-ray scattering (SAXS), solution NMR, cryo-electron microscopy (cryo-EM) and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots toward conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AlphaFold2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures. 
    more » « less
  4. Significant research on deep neural networks, culminating in AlphaFold2, convincingly shows that deep learning can predict the na- tive structure of a given protein sequence with high accuracy. In contrast, work on deep learning frameworks that can account for the structural plasticity of protein molecules remains in its infancy. Many researchers are now investigating deep generative models to explore the structure space of a protein. Current models largely use 2D convolution, leveraging representations of protein structures as contact maps or distance matri- ces. The goal is exclusively to generate protein-like, sequence-agnostic tertiary structures, but no rigorous metrics are utilized to convincingly make this case. This paper makes several contributions. It builds on momentum in graph representation learning and formalizes a protein tertiary structure as a contact graph. It demonstrates that graph repre- sentation learning outperforms models based on image convolution. This work also equips graph-based deep latent variable models with the abil- ity to learn from experimentally-available tertiary structures of proteins of varying lengths. The resulting models are shown to outperform state- of-the-art ones on rigorous metrics that quantify both local and distal patterns in physically-realistic protein structures. We hope this work will spur further research in deep generative models for obtaining a broader view of the structure space of a protein molecule. 
    more » « less
  5. Abstract MotivationQuality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. ResultsWe develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. Availability and implementationThe source code is available at https://github.com/BioinfoMachineLearning/EnQA. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less