Abstract Collective variable (CV)‐based enhanced sampling techniques are widely used today for accelerating barrier‐crossing events in molecular simulations. A class of these methods, which includes temperature accelerated molecular dynamics (TAMD)/driven‐adiabatic free energy dynamics (d‐AFED), unified free energy dynamics (UFED), and temperature accelerated sliced sampling (TASS), uses an extended variable formalism to achieve quick exploration of conformational space. These techniques are powerful, as they enhance the sampling of a large number of CVs simultaneously compared to other techniques. Extended variables are kept at a much higher temperature than the physical temperature by ensuring adiabatic separation between the extended and physical subsystems and employing rigorous thermostatting. In this work, we present a computational platform to perform extended phase space enhanced sampling simulations using the open‐source molecular dynamics engine OpenMM. The implementation allows users to have interoperability of sampling techniques, as well as employ state‐of‐the‐art thermostats and multiple time‐stepping. This work also presents protocols for determining the critical parameters and procedures for reconstructing high‐dimensional free energy surfaces. As a demonstration, we present simulation results on the high dimensional conformational landscapes of the alanine tripeptide in vacuo, tetra‐N‐methylglycine (tetra‐sarcosine) peptoid in implicit solvent, and the Trp‐cage mini protein in explicit water.
more »
« less
Statistically optimal continuous free energy surfaces from biased simulations and multistate reweighting
Free energies as a function of a selected set of collective variables are commonly computed in molecular simulation and of significant value in understanding and engineering molecular behavior. These free energy surfaces are most commonly estimated using variants of histogramming techniques, but such approaches obscure two important facets of these functions. First, the empirical observations along the collective variable are defined by an ensemble of discrete observations, and the coarsening of these observations into a histogram bin incurs unnecessary loss of information. Second, the free energy surface is itself almost always a continuous function, and its representation by a histogram introduces inherent approximations due to the discretization. In this study, we relate the observed discrete observations from biased simulations to the inferred underlying continuous probability distribution over the collective variables and derive histogram-free techniques for estimating this free energy surface. We reformulate free energy surface estimation as minimization of a Kullback−Leibler divergence between a continuous trial function and the discrete empirical distribution and show that this is equivalent to likelihood maximization of a trial function given a set of sampled data. We then present a fully Bayesian treatment of this formalism, which enables the incorporation of powerful Bayesian tools such as the inclusion of regularizing priors, uncertainty quantification, and model selection techniques. We demonstrate this new formalism in the analysis of umbrella sampling simulations for the χ torsion of a valine side chain in the L99A mutant of T4 lysozyme with benzene bound in the cavity.
more »
« less
- Award ID(s):
- 1841810
- PAR ID:
- 10167324
- Date Published:
- Journal Name:
- Journal of Chemical Theory and Computation
- ISSN:
- 1549-9618
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider nonparametric estimation of a mixed discrete‐continuous distribution under anisotropic smoothness conditions and a possibly increasing number of support points for the discrete part of the distribution. For these settings, we derive lower bounds on the estimation rates. Next, we consider a nonparametric mixture of normals model that uses continuous latent variables for the discrete part of the observations. We show that the posterior in this model contracts at rates that are equal to the derived lower bounds up to a log factor. Thus, Bayesian mixture of normals models can be used for (up to a log factor) optimal adaptive estimation of mixed discrete‐continuous distributions. The proposed model demonstrates excellent performance in simulations mimicking the first stage in the estimation of structural discrete choice models.more » « less
-
Abstract Gaussian accelerated molecular dynamics (GaMD) is a robust computational method for simultaneous unconstrained enhanced sampling and free energy calculations of biomolecules. It works by adding a harmonic boost potential to smooth biomolecular potential energy surface and reduce energy barriers. GaMD greatly accelerates biomolecular simulations by orders of magnitude. Without the need to set predefined reaction coordinates or collective variables, GaMD provides unconstrained enhanced sampling and is advantageous for simulating complex biological processes. The GaMD boost potential exhibits a Gaussian distribution, thereby allowing for energetic reweighting via cumulant expansion to the second order (i.e., “Gaussian approximation”). This leads to accurate reconstruction of free energy landscapes of biomolecules. Hybrid schemes with other enhanced sampling methods, such as the replica‐exchange GaMD (rex‐GaMD) and replica‐exchange umbrella sampling GaMD (GaREUS), have also been introduced, further improving sampling and free energy calculations. Recently, new “selective GaMD” algorithms including the Ligand GaMD (LiGaMD) and Peptide GaMD (Pep‐GaMD) enabled microsecond simulations to capture repetitive dissociation and binding of small‐molecule ligands and highly flexible peptides. The simulations then allowed highly efficient quantitative characterization of the ligand/peptide binding thermodynamics and kinetics. Taken together, GaMD and its innovative variants are applicable to simulate a wide variety of biomolecular dynamics, including protein folding, conformational changes and allostery, ligand binding, peptide binding, protein–protein/nucleic acid/carbohydrate interactions, and carbohydrate/nucleic acid interactions. In this review, we present principles of the GaMD algorithms and recent applications in biomolecular simulations and drug design. This article is categorized under:Structure and Mechanism > Computational Biochemistry and BiophysicsMolecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo MethodsMolecular and Statistical Mechanics > Free Energy Methodsmore » « less
-
We present a surface-accelerated string method (SASM) to efficiently optimize low-dimensional reaction pathways from the sampling performed with expensive quantum mechanical/molecular mechanical (QM/MM) Hamiltonians. The SASM accelerates the convergence of the path using the aggregate sampling obtained from the current and previous string iterations, whereas approaches like the string method in collective variables (SMCV) or the modified string method in collective variables (MSMCV) update the path only from the sampling obtained from the current iteration. Furthermore, the SASM decouples the number of images used to perform sampling from the number of synthetic images used to represent the path. The path is optimized on the current best estimate of the free energy surface obtained from all available sampling, and the proposed set of new simulations is not restricted to being located along the optimized path. Instead, the umbrella potential placement is chosen to extend the range of the free energy surface and improve the quality of the free energy estimates near the path. In this manner, the SASM is shown to improve the exploration for a minimum free energy pathway in regions where the free energy surface is relatively flat. Furthermore, it improves the quality of the free energy profile when the string is discretized with too few images. We compare the SASM, SMCV, and MSMCV using 3 QM/MM applications: a ribozyme methyltransferase reaction using 2 reaction coordinates, the 2′-O-transphosphorylation reaction of Hammerhead ribozyme using 3 reaction coordinates, and a tautomeric reaction in B-DNA using 5 reaction coordinates. We show that SASM converges the paths using roughly 3 times less sampling than the SMCV and MSMCV methods. All three algorithms have been implemented in the FE-ToolKit package made freely available.more » « less
-
The study of phenomena such as protein folding and conformational changes in molecules is a central theme in chemical physics. Molecular dynamics (MD) simulation is the primary tool for the study of transition processes in biomolecules, but it is hampered by a huge timescale gap between the processes of interest and atomic vibrations that dictate the time step size. Therefore, it is imperative to combine MD simulations with other techniques in order to quantify the transition processes taking place on large timescales. In this work, the diffusion map with Mahalanobis kernel, a meshless approach for approximating the Backward Kolmogorov Operator (BKO) in collective variables, is upgraded to incorporate standard enhanced sampling techniques, such as metadynamics. The resulting algorithm, which we call the target measure Mahalanobis diffusion map (tm-mmap), is suitable for a moderate number of collective variables in which one can approximate the diffusion tensor and free energy. Imposing appropriate boundary conditions allows use of the approximated BKO to solve for the committor function and utilization of transition path theory to find the reactive current delineating the transition channels and the transition rate. The proposed algorithm, tm-mmap, is tested on the two-dimensional Moro–Cardin two-well system with position-dependent diffusion coefficient and on alanine dipeptide in two collective variables where the committor, the reactive current, and the transition rate are compared to those computed by the finite element method (FEM). Finally, tm-mmap is applied to alanine dipeptide in four collective variables where the use of finite elements is infeasible.more » « less
An official website of the United States government

