ResNet and, more recently, AlphaFold2 have demonstrated that deep neural networks can now predict a tertiary structure of a given protein amino-acid sequence with high accuracy. This seminal development will allow molecular biology researchers to advance various studies linking sequence, structure, and function. Many studies will undoubtedly focus on the impact of sequence mutations on stability, fold, and function. In this paper, we evaluate the ability of AlphaFold2 to predict accurate tertiary structures of wildtype and mutated sequences of protein molecules. We do so on a benchmark dataset in mutation modeling studies. Our empirical evaluation utilizes global and local structure analyses and yields several interesting observations. It shows, for instance, that AlphaFold2 performs similarly on wildtype and variant sequences. The placement of the main chain of a protein molecule is highly accurate. However, while AlphaFold2 reports similar confidence in its predictions over wildtype and variant sequences, its performance on placements of the side chains suffers in comparison to main-chain predictions. The analysis overall supports the premise that AlphaFold2-predicted structures can be utilized in further downstream tasks, but that further refinement of these structures may be necessary.
- Award ID(s):
- 1759472
- NSF-PAR ID:
- 10379954
- Date Published:
- Journal Name:
- Pacific symposium on biocomputing
- Volume:
- 27
- ISSN:
- 2335-6928
- Page Range / eLocation ID:
- 46-55
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract AlphaFold2 has revolutionized protein structure prediction from amino‐acid sequence. In addition to protein structures, high‐resolution dynamics information about various protein regions is important for understanding protein function. Although AlphaFold2 has neither been designed nor trained to predict protein dynamics, it is shown here how the information returned by AlphaFold2 can be used to predict dynamic protein regions at the individual residue level. The approach, which is termed cdsAF2, uses the 3D protein structure returned by AlphaFold2 to predict backbone NMR NH
S 2order parameters using a local contact model that takes into account the contacts made by each peptide plane along the backbone with its environment. By combining for each residue AlphaFold2's pLDDT confidence score for the structure prediction accuracy with the predictedS 2value using the local contact model, an estimator is obtained that semi‐quantitatively captures many of the dynamics features observed in experimental backbone NMR NHS 2order parameter profiles. The method is demonstrated for a set nine proteins of different sizes and variable amounts of dynamics and disorder. -
SE3Lig: SE(3)-equivariant CNNs for the reconstruction of cofactors and ligands in protein structuresProtein structure prediction algorithms such as AlphaFold2 and ESMFold have dramatically increased the availability of high-quality models of protein structures. Because these algorithms predict only the structure of the protein itself, there is a growing need for methods that can rapidly screen protein structures for ligands. Previous work on similar tasks has shown promise but is lacking scope in the classes of atoms predicted and can benefit from the recent architectural developments in convolutional neural networks (CNNs). In this work, we introduce SE3Lig, a model for semantic in-painting of small molecules in protein structures. Specifically, we report SE(3)-equivariant CNNs trained to predict the atomic densities of common classes of cofactors (hemes, flavins, etc.) and the water molecules and inorganic ions in their vicinity. While the models are trained on high-resolution crystal structures of enzymes, they perform well on structures predicted by AlphaFold2, which suggests that the algorithm correctly represents cofactor-binding cavities.more » « less
-
Abstract Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high‐confidence and low‐energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein–protein interactions, and protein‐ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low‐energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state‐of‐the‐art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates using ‐angle distribution predictions and geometry‐aware invariant point message passing (IPMP). On a test set of ∼1400 high‐quality protein chains, PIPPack is highly competitive with other state‐of‐the‐art PSCP methods in rotamer recovery and per‐residue RMSD but is significantly faster.
-
Abstract Motivation Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods.
Results We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA.
Availability and implementation The source code is available at https://github.com/BioinfoMachineLearning/EnQA.
Supplementary information Supplementary data are available at Bioinformatics online.