skip to main content

This content will become publicly available on April 3, 2023

Title: DLPacker : Deep learning for prediction of amino acid side chain conformations in proteins
Prediction of side chain conformations of amino acids in proteins (also termed “packing”) is an important and challenging part of protein structure prediction with many interesting applications in protein design. A variety of methods for packing have been developed but more accurate ones are still needed. Machine learning (ML) methods have recently become a powerful tool for solving various problems in diverse areas of science, including structural biology. In this study, we evaluate the potential of deep neural networks (DNNs) for prediction of amino acid side chain conformations. We formulate the problem as image-to-image transformation and train a U-net style DNN to solve the problem. We show that our method outperforms other physics-based methods by a significant margin: reconstruction RMSDs for most amino acids are about 20% smaller compared to SCWRL4 and Rosetta Packer with RMSDs for bulky hydrophobic amino acids Phe, Tyr, and Trp being up to 50% smaller.
Authors:
; ; ;
Editors:
EDITOR-IN-CHIEF Dokholyan, Nikolay V.; ASSOCIATE EDITORS: Bahar, Ivet ; Feig, Michael ; Varadarajan, Raghavan ; Wodak, Shoshana; Moult, John Center
Award ID(s):
2019745
Publication Date:
NSF-PAR ID:
10320625
Journal Name:
Proteins: Structure, Function, and Bioinformatics
ISSN:
0887-3585
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage ofmore »FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online.« less
  2. Herein we report the synthesis of ternary statistical methacrylate copolymers comprising cationic ammonium (amino-ethyl methacrylate: AEMA), carboxylic acid (propanoic acid methacrylate: PAMA) and hydrophobic (ethyl methacrylate: EMA) side chain monomers, to study the functional role of anionic groups on their antimicrobial and hemolytic activities as well as the conformation of polymer chains. The hydrophobic monomer EMA was maintained at 40 mol% in all the polymers, with different percentages of cationic ammonium (AEMA) and anionic carboxylate (PAMA) side chains, resulting in different total net charge for the polymers. The antimicrobial and hemolytic activities of the copolymer were determined by the net charge of +3 or larger, suggesting that there was no distinct effect of the anionic carboxylate groups on the antimicrobial and hemolytic activities of the copolymers. However, the pH titration and atomic molecular dynamics simulations suggest that anionic groups may play a strong role in controlling the polymer conformation. This was achieved via formation of salt bridges between cationic and anionic groups, transiently crosslinking the polymer chain allowing dynamic switching between compact and extended conformations. These results suggest that inclusion of functional groups in general, other than the canonical hydrophobic and cationic groups in antimicrobial agents, may have broader implicationsmore »in acquiring functional structures required for adequate antimicrobial activity. In order to explain the implications, we propose a molecular model in which formation of intra-chain, transient salt bridges, due to the presence of both anionic and cationic groups along the polymer, may function as “adhesives” which facilitate compact packing of the polymer chain to enable functional group interaction but without rigidly locking down the overall polymer structure, which may adversely affect their functional roles.« less
  3. Abstract

    Short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms lie within 2.7 Å, exhibit prominent quantum mechanical characters and are connected to a wide range of essential biomolecular processes. However, exact determination of the geometry and functional roles of SHBs requires a protein to be at atomic resolution. In this work, we analyze 1260 high-resolution peptide and protein structures from the Protein Data Bank and develop a boosting based machine learning model to predict the formation of SHBs between amino acids. This model, which we name as machine learning assisted prediction of short hydrogen bonds (MAPSHB), takes into account 21 structural, chemical and sequence features and their interaction effects and effectively categorizes each hydrogen bond in a protein to a short or normal hydrogen bond. The MAPSHB model reveals that the type of the donor amino acid plays a major role in determining the class of a hydrogen bond and that the side chain Tyr-Asp pair demonstrates a significant probability of forming a SHB. Combining electronic structure calculations and energy decomposition analysis, we elucidate how the interplay of competing intermolecular interactions stabilizes the Tyr-Asp SHBs more than other commonly observed combinations of amino acid side chains. The MAPSHB model,more »which is freely available on our web server, allows one to accurately and efficiently predict the presence of SHBs given a protein structure with moderate or low resolution and will facilitate the experimental and computational refinement of protein structures.

    « less
  4. Predicting protein side-chains is important for both protein structure prediction and protein design. Modeling approaches to predict side-chains such as SCWRL4 have become one of the most widely used tools of its type due to fast and highly accurate predictions. Motivated by the recent success of AlphaFold2 in CASP14, our group adapted a 3D equivariant neural network architecture to predict protein side-chain conformations, specifically within a protein-protein interface, a problem that has not been fully addressed by AlphaFold2.
  5. Disordered proline-rich motifs are common across the proteomes of many species and are often involved in protein-protein interactions. Proline is a unique amino acid due to the covalent bond between the backbone nitrogen and the proline side chain. The resulting five-membered ring allows proline to sample the cis state about its peptide bond, which other residues cannot do as readily. Because proline-rich disordered sequences exist as ensembles that likely include structures with the proline peptide bond in cis , a robust methodology to accurately account for these conformations in the overall ensemble is crucial. Observing the cis conformations of proline in a disordered sequence is challenging both experimentally and computationally. Nitrogen-hydrogen NMR spectroscopy cannot directly observe proline residues, which lack an amide bond, and computational methods struggle to overcome the large kinetic barrier between the cis and trans states, since isomerization usually occurs on the order of seconds. In the current work, Gaussian accelerated molecular dynamics was used to overcome this free energy barrier and simulate proline isomerization in a tetrapeptide (KPTP) and in the 12-residue proline-rich SH3 binding peptide, ArkA. We found that Gaussian accelerated molecular dynamics, when combined with a lowered peptide bond dihedral angle potential energy barriermore »(15 kcal/mol), allowed sufficient sampling of the proline cis and trans states on a microsecond timescale. All ArkA prolines spend a significant fraction of time in cis , leading to a more compact ensemble with less polyproline II helix structure than an ArkA ensemble with all peptide bonds in trans . The ensemble containing cis prolines also matches more closely to in vitro circular dichroism data than the all- trans ensemble. The ability of the ArkA prolines to isomerize likely affects the peptide’s ability to bind its partner SH3 domain, and should be studied further. This is the first molecular dynamics simulation study of proline isomerization in a biologically relevant proline-rich sequence that we know of, and a similar protocol could be applied to study multi-proline isomerization in other proline-containing proteins to improve conformational diversity and agreement with in vitro data.« less