skip to main content

Title: DLPacker : Deep learning for prediction of amino acid side chain conformations in proteins
Prediction of side chain conformations of amino acids in proteins (also termed “packing”) is an important and challenging part of protein structure prediction with many interesting applications in protein design. A variety of methods for packing have been developed but more accurate ones are still needed. Machine learning (ML) methods have recently become a powerful tool for solving various problems in diverse areas of science, including structural biology. In this study, we evaluate the potential of deep neural networks (DNNs) for prediction of amino acid side chain conformations. We formulate the problem as image-to-image transformation and train a U-net style DNN to solve the problem. We show that our method outperforms other physics-based methods by a significant margin: reconstruction RMSDs for most amino acids are about 20% smaller compared to SCWRL4 and Rosetta Packer with RMSDs for bulky hydrophobic amino acids Phe, Tyr, and Trp being up to 50% smaller.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
EDITOR-IN-CHIEF Dokholyan, Nikolay V.; ASSOCIATE EDITORS: Bahar, Ivet ; Feig, Michael ; Varadarajan, Raghavan ; Wodak, Shoshana; Moult, John Center
Date Published:
Journal Name:
Proteins: Structure, Function, and Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    In the prediction of protein structure from amino acid sequence, loops are challenging regions for computational methods. Since loops are often located on the protein surface, they can have significant roles in determining protein functions and binding properties. Loop prediction without the aid of a structural template requires extensive conformational sampling and energy minimization, which are computationally difficult. In this article we present a newde novoloop sampling method, the Parallely filtered Energy Targeted All‐atom Loop Sampler (PETALS) to rapidly locate low energy conformations. PETALS explores both backbone and side‐chain positions of the loop region simultaneously according to the energy function selected by the user, and constructs a nonredundant ensemble of low energy loop conformations using filtering criteria. The method is illustrated with the DFIRE potential and DiSGro energy function for loops, and shown to be highly effective at discovering conformations with near‐native (or better) energy. Using the same energy function as the DiSGro algorithm, PETALS samples conformations with both lower RMSDs and lower energies. PETALS is also useful for assessing the accuracy of different energy functions. PETALS runs rapidly, requiring an average time cost of 10 minutes for a length 12 loop on a single 3.2 GHz processor core, comparable to the fastest existingde novomethods for generating an ensemble of conformations. Proteins 2017; 85:1402–1412. © 2017 Wiley Periodicals, Inc.

    more » « less
  2. Abstract

    Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high‐confidence and low‐energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein–protein interactions, and protein‐ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low‐energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state‐of‐the‐art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates using ‐angle distribution predictions and geometry‐aware invariant point message passing (IPMP). On a test set of ∼1400 high‐quality protein chains, PIPPack is highly competitive with other state‐of‐the‐art PSCP methods in rotamer recovery and per‐residue RMSD but is significantly faster.

    more » « less
  3. Knowles, David A ; Mostafavi, Sara (Ed.)
    Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein’s backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains’ true degrees of freedom: the dihedral $\chi$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions. 
    more » « less
  4. Abstract

    There have been several studies suggesting that protein structures solved by NMR spectroscopy and X‐ray crystallography show significant differences. To understand the origin of these differences, we assembled a database of high‐quality protein structures solved by both methods. We also find significant differences between NMR and crystal structures—in the root‐mean‐square deviations of the Cαatomic positions, identities of core amino acids, backbone, and side‐chain dihedral angles, and packing fraction of core residues. In contrast to prior studies, we identify the physical basis for these differences by modeling protein cores as jammed packings of amino acid‐shaped particles. We find that we can tune the jammed packing fraction by varying the degree of thermalization used to generate the packings. For an athermal protocol, we find that the average jammed packing fraction is identical to that observed in the cores of protein structures solved by X‐ray crystallography. In contrast, highly thermalized packing‐generation protocols yield jammed packing fractions that are even higher than those observed in NMR structures. These results indicate that thermalized systems can pack more densely than athermal systems, which suggests a physical basis for the structural differences between protein structures solved by NMR and X‐ray crystallography.

    more » « less
  5. Abstract Over the past two decades, mass spectrometric (MS)-based proteomics technologies have facilitated the study of signaling pathways throughout biology. Nowhere is this needed more than in plants, where an evolutionary history of genome duplications has resulted in large gene families involved in posttranslational modifications and regulatory pathways. For example, at least 5% of the Arabidopsis thaliana genome (ca. 1,200 genes) encodes protein kinases and protein phosphatases that regulate nearly all aspects of plant growth and development. MS-based technologies that quantify covalent changes in the side-chain of amino acids are critically important, but they only address one piece of the puzzle. A more crucially important mechanistic question is how noncovalent interactions—which are more difficult to study—dynamically regulate the proteome’s 3D structure. The advent of improvements in protein 3D technologies such as cryo-electron microscopy, nuclear magnetic resonance, and X-ray crystallography has allowed considerable progress to be made at this level, but these methods are typically limited to analyzing proteins, which can be expressed and purified in milligram quantities. Newly emerging MS-based technologies have recently been developed for studying the 3D structure of proteins. Importantly, these methods do not require protein samples to be purified and require smaller amounts of sample, opening the wider proteome for structural analysis in complex mixtures, crude lysates, and even in intact cells. These MS-based methods include covalent labeling, crosslinking, thermal proteome profiling, and limited proteolysis, all of which can be leveraged by established MS workflows, as well as newly emerging methods capable of analyzing intact macromolecules and the complexes they form. In this review, we discuss these recent innovations in MS-based “structural” proteomics to provide readers with an understanding of the opportunities they offer and the remaining challenges for understanding the molecular underpinnings of plant structure and function. 
    more » « less