skip to main content


Title: Neural Upscaling from Residue-Level Protein Structure Networks to Atomistic Structures
Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly compressed representations of molecular structures, and simulations operating via such representations can achieve substantial computational savings. A drawback of coarse-graining, however, is the loss of atomistic detail—an effect that is especially acute for topological representations such as protein structure networks (PSNs). Here, we introduce an approach based on a combination of machine learning and physically-guided refinement for inferring atomic coordinates from PSNs. This “neural upscaling” procedure exploits the constraints implied by PSNs on possible configurations, as well as differences in the likelihood of observing different configurations with the same PSN. Using a 1 μs atomistic molecular dynamics trajectory of Aβ1–40, we show that neural upscaling is able to effectively recapitulate detailed structural information for intrinsically disordered proteins, being particularly successful in recovering features such as transient secondary structure. These results suggest that scalable network-based models for protein structure and dynamics may be used in settings where atomistic detail is desired, with upscaling employed to impute atomic coordinates from PSNs.  more » « less
Award ID(s):
1361425 1826589 1939237
NSF-PAR ID:
10376711
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Biomolecules
Volume:
11
Issue:
12
ISSN:
2218-273X
Page Range / eLocation ID:
1788
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We developed coarse-grained models of spike proteins in SARS-CoV-2 coronavirus and angiotensin-converting enzyme 2 (ACE2) receptor proteins to study the endocytosis of a whole coronavirus under physiologically relevant spatial and temporal scales. We first conducted all-atom explicit-solvent molecular dynamics simulations of the recently characterized structures of spike and ACE2 proteins. We then established coarse-grained models using the shape-based coarse-graining approach based on the protein crystal structures and extracted the force field parameters from the all-atom simulation trajectories. To further analyze the coarse-grained models, we carried out normal mode analysis of the coarse-grained models to refine the force field parameters by matching the fluctuations of the internal coordinates with the original all-atom simulations. Finally, we demonstrated the capability of these coarse-grained models by simulating the endocytosis of a whole coronavirus through the host cell membrane. We embedded the coarse-grained models of spikes on the surface of the virus envelope and anchored ACE2 receptors on the host cell membrane, which is modeled using a one-particle-thick lipid bilayer model. The coarse-grained simulations show the spike proteins adopt bent configurations due to their unique flexibility during their interaction with the ACE2 receptors, which makes it easier for them to attach to the host cell membrane than rigid spikes. 
    more » « less
  2. Single-molecule Förster resonance energy transfer (smFRET) is an experimental methodology to track the real-time dynamics of molecules using fluorescent probes to follow one or more intramolecular distances. These distances provide a low-dimensional representation of the full atomistic dynamics. Under mild technical conditions, Takens’ Delay Embedding Theorem guarantees that the full three-dimensional atomistic dynamics of a system are diffeomorphic (i.e., related by a smooth and invertible transformation) to a time-delayed embedding of one or more scalar observables. Appealing to these theoretical guarantees, we employ manifold learning, artificial neural networks, and statistical mechanics to learn from molecular simulation training data the a priori unknown transformation between the atomic coordinates and delay-embedded intramolecular distances accessible to smFRET. This learned transformation may then be used to reconstruct atomistic coordinates from smFRET time series data. We term this approach Single-molecule TAkens Reconstruction (STAR). We have previously applied STAR to reconstruct molecular configurations of a C24H50 polymer chain and the mini-protein Chignolin with accuracies better than 0.2 nm from simulated smFRET data under noise free and high time resolution conditions. In the present work, we investigate the role of signal-to-noise ratio, data volume, and time resolution in simulated smFRET data to assess the performance of STAR under conditions more representative of experimental realities. We show that STAR can reconstruct the Chignolin and Villin mini-proteins to accuracies of 0.12 and 0.42 nm, respectively, and place bounds on these conditions for accurate reconstructions. These results demonstrate that it is possible to reconstruct dynamical trajectories of protein folding from time series in noisy, time binned, experimentally measurable observables and lay the foundations for the application of STAR to real experimental data. 
    more » « less
  3. Coarse-grained (CG) models have been successful in simulating the chemical properties of lipid bilayers, but accurate treatment of membrane proteins and lipid-protein molecular interactions remains a challenge. The CgProt force field, original developed with the multiscale coarse graining method, is assessed by comparing the potentials of mean force for sidechain insertion in a DOPC bilayer to results reported for atomistic molecular dynamics simulations. Reassignment of select CG sidechain sites from the apolar to polar site type was found to improve the attractive interfacial behavior of tyrosine, phenylalanine and asparagine as well as charged lysine and arginine residues. The solvation energy at membrane depths of 0, 1.3 and 1.7 nm correlates with experimental partition coefficients in aqueous mixtures of cyclohexane, octanol and POPC, respectively, for sidechain analogs and Wimley-White peptides. These experimental values serve as important anchor points in choosing between alternate CG models based on their observed permeation profiles, particularly for Arg, Lys and Gln residues where the all-atom OPLS solvation energy does not agree well with experiment. Available partitioning data was also used to reparameterize the representation of the peptide backbone, which needed to be made less attractive for the bilayer hydrophobic core region. The newly developed force field, CgProt 2.4, correctly predicts the global energy minimum in the potentials of mean force for insertion of the uncharged membrane-associated peptides LS3 and WALP23. CgProt will find application in studies of lipid-protein interactions and the conformational properties of diverse membrane protein systems. 
    more » « less
  4. ABSTRACT: Molecular simulations with atomistic or coarse- 6 grained force fields are a powerful approach for understanding and 7 predicting the self-assembly phase behavior of complex molecules. 8 Amphiphiles, block oligomers, and block polymers can form 9 mesophases with different ordered morphologies describing the 10 spatial distribution of the blocks, but entirely amorphous nature for 11 local packing and chain conformation. Screening block oligomer 12 chemistry and architecture through molecular simulations to find 13 promising candidates for functional materials is aided by effective 14 and straightforward morphology identification techniques. Captur- 15 ing 3-dimensional periodic structures, such as ordered network 16 morphologies, is hampered by the requirement that the number of 17 molecules in the simulated system and the shape of the periodic simulation box need to be commensurate with those of the resulting 18 network phase. Common strategies for structure identification include structure factors and order parameters, but these fail to 19 identify imperfect structures in simulations with incorrect system sizes. Building upon pioneering work by DeFever et al. [Chem. Sci. 20 2019, 10, 7503−7515] who implemented a PointNet (i.e., a neural network designed for computer vision applications using point 21 clouds) to detect local structure in simulations of single-bead particles and water molecules, we present a PointNet for detection of 22 nonlocal ordered morphologies of complex block oligomers. Our PointNet was trained using atomic coordinates from molecular 23 dynamics simulation trajectories and synthetic point clouds for ordered network morphologies that were absent from previous 24 simulations. In contrast to prior work on simple molecules, we observe that large point clouds with 1000 or more points are needed 25 for the more complex block oligomers. The trained PointNet model achieves an accuracy as high as 0.99 for globally ordered 26 morphologies formed by linear diblock, linear triblock, and 3-arm and 4-arm star-block oligomers, and it also allows for the discovery 27 of emerging ordered patterns from nonequilibrium systems. 
    more » « less
  5. Limitations in the applicability, accuracy, and precision of individual structure characterization methods can sometimes be overcome via an integrative modeling approach that relies on information from all available sources, including all available experimental data and prior models. The open-source Integrative Modeling Platform (IMP) is one piece of software that implements all computational aspects of integrative modeling. To maximize the impact of integrative structures, the coordinates should be made publicly available, as is already the case for structures based on X-ray crystallography, NMR spectroscopy, and electron microscopy. Moreover, the associated experimental data and modeling protocols should also be archived, such that the original results can easily be reproduced. Finally, it is essential that the integrative structures are validated as part of their publication and deposition. A number of research groups have already developed software to implement integrative modeling and have generated a number of structures, prompting the formation of an Integrative/Hybrid Methods Task Force. Following the recommendations of this task force, the existing PDBx/mmCIF data representation used for atomic PDB structures has been extended to address the requirements for archiving integrative structural models. This IHM-dictionary adds a flexible model representation, including coarse graining, models in multiple states and/or related by time or other order, and multiple input experimental information sources. A prototype archiving system called PDB-Dev ( https://pdb-dev.wwpdb.org ) has also been created to archive integrative structural models, together with a Python library to facilitate handling of integrative models in PDBx/mmCIF format. 
    more » « less