skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 14 until 2:00 AM ET on Friday, May 15 due to maintenance. We apologize for the inconvenience.


Title: Quantifying Unbiased Conformational Ensembles from Biased Simulations Using ShapeGMM
Quantifying the conformational ensembles of biomolecules is fundamental to describing mechanisms of processes such as protein folding, interconversion between folded states, ligand binding, and allosteric regulation. Accurate quantification of these ensembles remains a challenge for conventional molecular simulations of all but the simplest molecules due to insufficient sampling. Enhanced sampling approaches, such as metadynamics, were designed to overcome this challenge; however, the nonuniform frame weights that result from many of these approaches present an additional challenge to ensemble quantification techniques such as Markov State Modeling or structural clustering. Here, we present rigorous inclusion of nonuniform frame weights into a structural clustering method entitled shapeGMM. The result of frame-weighted shapeGMM is a high dimensional probability density and generative model for the unbiased system from which we can compute important thermodynamic properties such as relative free energies and configurational entropy. The accuracy of this approach is demonstrated by the quantitative agreement between GMMs computed by Hamiltonian reweighting and direct simulation of a coarse-grained helix model system. Furthermore, the relative free energy computed from a shapeGMM probability density of alanine dipeptide reweighted from a metadynamics simulation quantitatively reproduces the underlying free energy in the basins. Finally, the method identifies hidden structures along the actin globular to filamentous-like structural transition from a metadynamics simulation on a linear discriminant analysis coordinate trained on GMM states, illustrating how structural clustering of biased data can lead to biophysical insight. Combined, these results demonstrate that frame-weighted shapeGMM is a powerful approach to quantifying biomolecular ensembles from biased simulations.  more » « less
Award ID(s):
2238706
PAR ID:
10504773
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACS
Date Published:
Journal Name:
Journal of Chemical Theory and Computation
ISSN:
1549-9618
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Here, we present a detailed workflow for clustering and enhanced sampling of biomolecular conformations using the ShapeGMM methodology. This approach fits a probabilistic model of biomolecular conformations rooted in the idea that the free energy can be expressed in terms of local fluctuations in atomic positions around metastable states. We demonstrate using a single model system how to generate and fit equilibrium molecular dynamics simulation data. We then demonstrate how to use the resulting model to generate a reaction coordinate between two states, how to sample along that coordinate using metadynamics using our size-and-shape PLUMED module, and how to cluster those biased conformations to obtain a refined equilibrium ShapeGMM model. 
    more » « less
  2. Abstract We consider the construction of confidence bands for survival curves under the outcome‐dependent stratified sampling. A main challenge of this design is that data are a biased dependent sample due to stratification and sampling without replacement. Most literature on regression approximates this design by Bernoulli sampling but variance is generally overestimated. Even with this approximation, the limiting distribution of the inverse probability weighted Kaplan–Meier estimator involves a general Gaussian process, and hence quantiles of its supremum is not analytically available. In this paper, we provide a rigorous asymptotic theory for the weighted Kaplan–Meier estimator accounting for dependence in the sample. We propose the novel hybrid method to both simulate and bootstrap parts of the limiting process to compute confidence bands with asymptotically correct coverage probability. Simulation study indicates that the proposed bands are appropriate for practical use. A Wilms tumor example is presented. 
    more » « less
  3. Free energies as a function of a selected set of collective variables are commonly computed in molecular simulation and of significant value in understanding and engineering molecular behavior. These free energy surfaces are most commonly estimated using variants of histogramming techniques, but such approaches obscure two important facets of these functions. First, the empirical observations along the collective variable are defined by an ensemble of discrete observations, and the coarsening of these observations into a histogram bin incurs unnecessary loss of information. Second, the free energy surface is itself almost always a continuous function, and its representation by a histogram introduces inherent approximations due to the discretization. In this study, we relate the observed discrete observations from biased simulations to the inferred underlying continuous probability distribution over the collective variables and derive histogram-free techniques for estimating this free energy surface. We reformulate free energy surface estimation as minimization of a Kullback−Leibler divergence between a continuous trial function and the discrete empirical distribution and show that this is equivalent to likelihood maximization of a trial function given a set of sampled data. We then present a fully Bayesian treatment of this formalism, which enables the incorporation of powerful Bayesian tools such as the inclusion of regularizing priors, uncertainty quantification, and model selection techniques. We demonstrate this new formalism in the analysis of umbrella sampling simulations for the χ torsion of a valine side chain in the L99A mutant of T4 lysozyme with benzene bound in the cavity. 
    more » « less
  4. The National Alzheimer's Coordinating Center Uniform Data Set includes test results from a battery of cognitive exams. Motivated by the need to model the cognitive ability of low‐performing patients we create a composite score from ten tests and propose to model this score using a partially linear quantile regression model for longitudinal studies with non‐ignorable dropouts. Quantile regression allows for modeling non‐central tendencies. The partially linear model accommodates nonlinear relationships between some of the covariates and cognitive ability. The data set includes patients that leave the study prior to the conclusion. Ignoring such dropouts will result in biased estimates if the probability of dropout depends on the response. To handle this challenge, we propose a weighted quantile regression estimator where the weights are inversely proportional to the estimated probability a subject remains in the study. We prove that this weighted estimator is a consistent and efficient estimator of both linear and nonlinear effects. 
    more » « less
  5. Metadynamics calculations of large chemical systems with ab initio methods are computationally prohibitive due to the extensive sampling required to simulate the large degrees of freedom in these systems. To address this computational bottleneck, we utilized a GPU-enhanced density functional tight binding (DFTB) approach on a massively parallelized cloud computing platform to efficiently calculate the thermodynamics and metadynamics of biochemical systems. To first validate our approach, we calculated the free-energy surfaces of alanine dipeptide and showed that our GPU-enhanced DFTB calculations qualitatively agree with computationally-intensive hybrid DFT benchmarks, whereas classical force fields give significant errors. Most importantly, we show that our GPU-accelerated DFTB calculations are significantly faster than previous approaches by up to two orders of magnitude. To further extend our GPU-enhanced DFTB approach, we also carried out a 10 ns metadynamics simulation of remdesivir, which is prohibitively out of reach for routine DFT-based metadynamics calculations. We find that the free-energy surfaces of remdesivir obtained from DFTB and classical force fields differ significantly, where the latter overestimates the internal energy contribution of high free-energy states. Taken together, our benchmark tests, analyses, and extensions to large biochemical systems highlight the use of GPU-enhanced DFTB simulations for efficiently predicting the free-energy surfaces/thermodynamics of large biochemical systems. 
    more » « less