skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Statistically optimal continuous free energy surfaces from biased simulations and multistate reweighting
Free energies as a function of a selected set of collective variables are commonly computed in molecular simulation and of significant value in understanding and engineering molecular behavior. These free energy surfaces are most commonly estimated using variants of histogramming techniques, but such approaches obscure two important facets of these functions. First, the empirical observations along the collective variable are defined by an ensemble of discrete observations, and the coarsening of these observations into a histogram bin incurs unnecessary loss of information. Second, the free energy surface is itself almost always a continuous function, and its representation by a histogram introduces inherent approximations due to the discretization. In this study, we relate the observed discrete observations from biased simulations to the inferred underlying continuous probability distribution over the collective variables and derive histogram-free techniques for estimating this free energy surface. We reformulate free energy surface estimation as minimization of a Kullback−Leibler divergence between a continuous trial function and the discrete empirical distribution and show that this is equivalent to likelihood maximization of a trial function given a set of sampled data. We then present a fully Bayesian treatment of this formalism, which enables the incorporation of powerful Bayesian tools such as the inclusion of regularizing priors, uncertainty quantification, and model selection techniques. We demonstrate this new formalism in the analysis of umbrella sampling simulations for the χ torsion of a valine side chain in the L99A mutant of T4 lysozyme with benzene bound in the cavity.  more » « less
Award ID(s):
1841810
PAR ID:
10167324
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Chemical Theory and Computation
ISSN:
1549-9618
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Collective variable (CV)‐based enhanced sampling techniques are widely used today for accelerating barrier‐crossing events in molecular simulations. A class of these methods, which includes temperature accelerated molecular dynamics (TAMD)/driven‐adiabatic free energy dynamics (d‐AFED), unified free energy dynamics (UFED), and temperature accelerated sliced sampling (TASS), uses an extended variable formalism to achieve quick exploration of conformational space. These techniques are powerful, as they enhance the sampling of a large number of CVs simultaneously compared to other techniques. Extended variables are kept at a much higher temperature than the physical temperature by ensuring adiabatic separation between the extended and physical subsystems and employing rigorous thermostatting. In this work, we present a computational platform to perform extended phase space enhanced sampling simulations using the open‐source molecular dynamics engine OpenMM. The implementation allows users to have interoperability of sampling techniques, as well as employ state‐of‐the‐art thermostats and multiple time‐stepping. This work also presents protocols for determining the critical parameters and procedures for reconstructing high‐dimensional free energy surfaces. As a demonstration, we present simulation results on the high dimensional conformational landscapes of the alanine tripeptide in vacuo, tetra‐N‐methylglycine (tetra‐sarcosine) peptoid in implicit solvent, and the Trp‐cage mini protein in explicit water. 
    more » « less
  2. We consider nonparametric estimation of a mixed discrete‐continuous distribution under anisotropic smoothness conditions and a possibly increasing number of support points for the discrete part of the distribution. For these settings, we derive lower bounds on the estimation rates. Next, we consider a nonparametric mixture of normals model that uses continuous latent variables for the discrete part of the observations. We show that the posterior in this model contracts at rates that are equal to the derived lower bounds up to a log factor. Thus, Bayesian mixture of normals models can be used for (up to a log factor) optimal adaptive estimation of mixed discrete‐continuous distributions. The proposed model demonstrates excellent performance in simulations mimicking the first stage in the estimation of structural discrete choice models. 
    more » « less
  3. Abstract Variational quantum eigensolvers (VQEs) represent a promising approach to computing molecular ground states and energies on modern quantum computers. These approaches use a classical computer to optimize the parameters of a trial wave function, while the quantum computer simulates the energy by preparing and measuring a set of bitstring observations, referred to as shots, over which an expected value is computed. Although more shots improve the accuracy of the expected ground state, it also increases the simulation cost. Hence, we propose modifications to the standard Bayesian optimization algorithm to leverage few‐shot circuit observations to solve VQEs with fewer quantum resources. We demonstrate the effectiveness of our proposed approach, Bayesian optimization with priors on surface topology (BOPT), by comparing optimizers for molecular systems and demonstrate how current quantum hardware can aid in finding ground‐state energies. 
    more » « less
  4. Abstract Gaussian accelerated molecular dynamics (GaMD) is a robust computational method for simultaneous unconstrained enhanced sampling and free energy calculations of biomolecules. It works by adding a harmonic boost potential to smooth biomolecular potential energy surface and reduce energy barriers. GaMD greatly accelerates biomolecular simulations by orders of magnitude. Without the need to set predefined reaction coordinates or collective variables, GaMD provides unconstrained enhanced sampling and is advantageous for simulating complex biological processes. The GaMD boost potential exhibits a Gaussian distribution, thereby allowing for energetic reweighting via cumulant expansion to the second order (i.e., “Gaussian approximation”). This leads to accurate reconstruction of free energy landscapes of biomolecules. Hybrid schemes with other enhanced sampling methods, such as the replica‐exchange GaMD (rex‐GaMD) and replica‐exchange umbrella sampling GaMD (GaREUS), have also been introduced, further improving sampling and free energy calculations. Recently, new “selective GaMD” algorithms including the Ligand GaMD (LiGaMD) and Peptide GaMD (Pep‐GaMD) enabled microsecond simulations to capture repetitive dissociation and binding of small‐molecule ligands and highly flexible peptides. The simulations then allowed highly efficient quantitative characterization of the ligand/peptide binding thermodynamics and kinetics. Taken together, GaMD and its innovative variants are applicable to simulate a wide variety of biomolecular dynamics, including protein folding, conformational changes and allostery, ligand binding, peptide binding, protein–protein/nucleic acid/carbohydrate interactions, and carbohydrate/nucleic acid interactions. In this review, we present principles of the GaMD algorithms and recent applications in biomolecular simulations and drug design. This article is categorized under:Structure and Mechanism > Computational Biochemistry and BiophysicsMolecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo MethodsMolecular and Statistical Mechanics > Free Energy Methods 
    more » « less
  5. We present a surface-accelerated string method (SASM) to efficiently optimize low-dimensional reaction pathways from the sampling performed with expensive quantum mechanical/molecular mechanical (QM/MM) Hamiltonians. The SASM accelerates the convergence of the path using the aggregate sampling obtained from the current and previous string iterations, whereas approaches like the string method in collective variables (SMCV) or the modified string method in collective variables (MSMCV) update the path only from the sampling obtained from the current iteration. Furthermore, the SASM decouples the number of images used to perform sampling from the number of synthetic images used to represent the path. The path is optimized on the current best estimate of the free energy surface obtained from all available sampling, and the proposed set of new simulations is not restricted to being located along the optimized path. Instead, the umbrella potential placement is chosen to extend the range of the free energy surface and improve the quality of the free energy estimates near the path. In this manner, the SASM is shown to improve the exploration for a minimum free energy pathway in regions where the free energy surface is relatively flat. Furthermore, it improves the quality of the free energy profile when the string is discretized with too few images. We compare the SASM, SMCV, and MSMCV using 3 QM/MM applications: a ribozyme methyltransferase reaction using 2 reaction coordinates, the 2′-O-transphosphorylation reaction of Hammerhead ribozyme using 3 reaction coordinates, and a tautomeric reaction in B-DNA using 5 reaction coordinates. We show that SASM converges the paths using roughly 3 times less sampling than the SMCV and MSMCV methods. All three algorithms have been implemented in the FE-ToolKit package made freely available. 
    more » « less