skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1900061

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules. 
    more » « less
  2. Over the past decade, Markov State Models (MSM) have emerged as powerful methodologies to build discrete models of dynamics over structures obtained from Molecular Dynamics trajectories. The identification of macrostates for the MSM is a central decision that impacts the quality of the MSM but depends on both the selected representation of a structure and the clustering algorithm utilized over the featurized structures. Motivated by a large molecular system in its free and bound state, this paper investigates two directions of research, further reducing the representation dimensionality in a non-parametric, data-driven manner and including more structures in the computation. Rigorous evaluation of the quality of obtained MSMs via various statistical tests in a comparative setting firmly shows that fewer dimensions and more structures result in a better MSM. Many interesting findings emerge from the best MSM, advancing our understanding of the relationship between antibody dynamics and antibody–antigen recognition. 
    more » « less
  3. Deep learning research, from ResNet to AlphaFold2, convincingly shows that deep learning can predict the native conformation of a given protein sequence with high accu- racy. Accounting for the plasticity of protein molecules remains challenging, and powerful algorithms are needed to sample the conformation space of a given amino-acid sequence. In the complex and high-dimensional energy surface that accompanies this space, it is critical to explore a broad range of areas. In this paper, we present a novel evolutionary algorithm that guides its optimization process with a memory of the explored conformation space, so that it can avoid searching already explored regions and search in the unexplored regions. The algorithm periodically consults an evolving map that stores already sampled non- redundant conformations to enhance exploration during selection. Evaluation on diverse datasets shows superior performance of the algorithm over the state-of-the-art algorithms. 
    more » « less
  4. Many biological and biotechnological processes are controlled by protein–protein and protein–solvent interactions. In order to understand, predict, and optimize such processes, it is important to understand how solvents affect protein structure during protein–solvent interactions. In this study, all-atom molecular dynamics are used to investigate the structural dynamics and energetic properties of a C-terminal domain of the Rift Valley Fever Virus L protein solvated in glycerol and aqueous glycerol solutions in different concentrations by molecular weight. The Generalized Amber Force Field is modified by including restrained electrostatic potential atomic charges for the glycerol molecules. The peptide is considered in detail by monitoring properties like the root-mean-squared deviation, root-mean-squared fluctuation, radius of gyration, hydrodynamic radius, end-to-end distance, solvent-accessible surface area, intra-potential energy, and solvent–peptide interaction energies for hundreds of nanoseconds. Secondary structure analysis is also performed to examine the extent of conformational drift for the individual helices and sheets. We predict that the peptide helices and sheets are maintained only when the modeling strategy considers the solvent with lower glycerol concentration. We also find that the solvent-peptide becomes more cohesive with decreasing glycerol concentrations. The density and radial distribution function of glycerol solvent calculated when modeled with the modified atomic charges show a very good agreement with experimental results and other simulations at 298.15K. 
    more » « less
  5. null (Ed.)
  6. We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 promises to reveal a high-quality native structure for possibly many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem. 
    more » « less
  7. This study elucidates the conformation dynamics of the free and antigen-bound antibody. Previous work has verified that antigen binding allosterically promotes Fc receptor recognition. Analysis of extensive molecular dynamics simulations finds that the energy landscape may play a decisive role in coordinating conformation changes but does not provide connections between the various conformational states. Here we provide such a connection. To obtain a detailed understanding of the impact of antigen binding on antibody conformation dynamics, this study utilizes Markov State Models to summarize the conformation dynamics probed in silico. We additionally equip these models with the ability to directly exploit the energy landscape view of dynamics via a computational method that detects energy basins and so allows utilizing detected basins as macrostates for the Markov State Model. Our study reveals many interesting findings and suggests that the antigen-bound form with high energy may provide many dynamic processes to further enhance co-factor binding of the antibody in the next step. 
    more » « less
  8. Representation learning via deep generative models is opening a new avenue for small molecule generation in silico. Linking chemical and biological space remains a key challenge. In this paper, we debut a graph-based variational autoencoder framework to address this challenge under the umbrella of disentangled representation learning. The framework permits several inductive biases that connect the learned latent factors to molecular properties. Evaluation on diverse benchmark datasets shows that the resulting models are powerful and open up an exciting line of research on controllable molecule generation in support of cheminformatics, drug discovery, and other application settings. 
    more » « less