skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Vendi sampling for molecular simulations: Diversity as a force for faster convergence and better exploration
Molecular dynamics (MD) is the method of choice for understanding the structure, function, and interactions of molecules. However, MD simulations are limited by the strong metastability of many molecules, which traps them in a single conformation basin for an extended amount of time. Enhanced sampling techniques, such as metadynamics and replica exchange, have been developed to overcome this limitation and accelerate the exploration of complex free energy landscapes. In this paper, we propose Vendi Sampling, a replica-based algorithm for increasing the efficiency and efficacy of the exploration of molecular conformation spaces. In Vendi sampling, replicas are simulated in parallel and coupled via a global statistical measure, the Vendi Score, to enhance diversity. Vendi sampling allows for the recovery of unbiased sampling statistics and dramatically improves sampling efficiency. We demonstrate the effectiveness of Vendi sampling in improving molecular dynamics simulations by showing significant improvements in coverage and mixing between metastable states and convergence of free energy estimates for four common benchmarks, including Alanine Dipeptide and Chignolin.  more » « less
Award ID(s):
2118201
PAR ID:
10475934
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
AIP Publishing
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
159
Issue:
14
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Rapid computational exploration of the free energy landscape of biological molecules remains an active area of research due to the difficulty of sampling rare state transitions in molecular dynamics (MD) simulations. In recent years, an increasing number of studies have exploited machine learning (ML) models to enhance and analyze MD simulations. Notably, unsupervised models that extract kinetic information from a set of parallel trajectories have been proposed including the variational approach for Markov processes (VAMP), VAMPNets, and time-lagged variational autoencoders (TVAE). In this work, we propose a combination of adaptive sampling with active learning of kinetic models to accelerate the discovery of the conformational landscape of biomolecules. In particular, we introduce and compare several techniques that combine kinetic models with two adaptive sampling regimes (least counts and multiagent reinforcement learning- based adaptive sampling) to enhance the exploration of conformational ensembles without introducing biasing forces. Moreover, inspired by the active learning approach of uncertainty-based sampling, we also present MaxEnt VAMPNet. This technique consists of restarting simulations from the microstates that maximize the Shannon entropy of a VAMPNet trained to perform the soft discretization of metastable states. By running simulations on two test systems, the WLALL pentapeptide and the villin headpiece subdomain, we empirically demonstrate that MaxEnt VAMPNet results in faster exploration of conformational landscapes compared with the baseline and other proposed methods. 
    more » « less
  2. Abstract Phosphodiesterase‐5 (PDE5) is responsible for regulating the concentration of the second messenger molecule cGMP by hydrolyzing it into 5′‐GMP. PDE5 is implicated in erectile dysfunction and cardiovascular diseases. The substrate binding site in the catalytic domain of PDE5 is surrounded by several dynamic structural motifs (including the α14 helix, M‐loop, and H‐loop) that are known to switch between inactive and active conformational states via currently unresolved structural intermediates. We evaluated the conformational dynamics of these structural motifs in the apo state and upon binding of an allosteric inhibitor (evodiamine) oravanafil, a competitive inhibitor. We employed enhanced sampling‐based replica exchange solute scaling (REST2) method, principal component analysis (PCA), time‐lagged independent component analysis (tICA), molecular dynamics (MD) simulations, and well‐tempered metadynamics simulations to probe the conformational changes in these structural motifs. Our results support a regulatory mechanism for PDE5, where the α14 helix alternates between an inward (lower activity) conformation and an outward (higher activity) conformation that is accompanied by the folding/unfolding of the α8′ and α8″ helices of the H‐loop. When the allosteric inhibitor evodiamine is bound to PDE5, the inward (inactive) state of the α14 helix is preferred, thus preventing substrate access to the catalytic site. In contrast, competitive inhibitors of PDE5 block catalysis by occupying the active site accompanied by stabilization of the outward conformation of the α14 helix. Defining the conformational dynamics underlying regulation of PDE5 activation will be helpful in rational design of next‐generation small molecules modulators of PDE5 activity. 
    more » « less
  3. Abstract Gaussian accelerated molecular dynamics (GaMD) is a robust computational method for simultaneous unconstrained enhanced sampling and free energy calculations of biomolecules. It works by adding a harmonic boost potential to smooth biomolecular potential energy surface and reduce energy barriers. GaMD greatly accelerates biomolecular simulations by orders of magnitude. Without the need to set predefined reaction coordinates or collective variables, GaMD provides unconstrained enhanced sampling and is advantageous for simulating complex biological processes. The GaMD boost potential exhibits a Gaussian distribution, thereby allowing for energetic reweighting via cumulant expansion to the second order (i.e., “Gaussian approximation”). This leads to accurate reconstruction of free energy landscapes of biomolecules. Hybrid schemes with other enhanced sampling methods, such as the replica‐exchange GaMD (rex‐GaMD) and replica‐exchange umbrella sampling GaMD (GaREUS), have also been introduced, further improving sampling and free energy calculations. Recently, new “selective GaMD” algorithms including the Ligand GaMD (LiGaMD) and Peptide GaMD (Pep‐GaMD) enabled microsecond simulations to capture repetitive dissociation and binding of small‐molecule ligands and highly flexible peptides. The simulations then allowed highly efficient quantitative characterization of the ligand/peptide binding thermodynamics and kinetics. Taken together, GaMD and its innovative variants are applicable to simulate a wide variety of biomolecular dynamics, including protein folding, conformational changes and allostery, ligand binding, peptide binding, protein–protein/nucleic acid/carbohydrate interactions, and carbohydrate/nucleic acid interactions. In this review, we present principles of the GaMD algorithms and recent applications in biomolecular simulations and drug design. This article is categorized under:Structure and Mechanism > Computational Biochemistry and BiophysicsMolecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo MethodsMolecular and Statistical Mechanics > Free Energy Methods 
    more » « less
  4. Molecular dynamics (MD) simulations are fundamental computational tools for the study of proteins and their free energy landscapes. However, sampling protein conformational changes through MD simulations is challenging due to the relatively long time scales of these processes. Many enhanced sampling approaches have emerged to tackle this problem, including biased sampling and path-sampling methods. In this Perspective, we focus on adaptive sampling algorithms. These techniques differ from other approaches because the thermodynamic ensemble is preserved and the sampling is enhanced solely by restarting MD trajectories at particularly chosen seeds rather than introducing biasing forces. We begin our treatment with an overview of theoretically transparent methods, where we discuss principles and guidelines for adaptive sampling. Then, we present a brief summary of select methods that have been applied to realistic systems in the past. Finally, we discuss recent advances in adaptive sampling methodology powered by deep learning techniques, as well as their shortcomings. 
    more » « less
  5. Abstract Molecular dynamics (MD) simulations are immensely valuable for studying protein structure, function and dynamics. Their ability to capture atomic‐level behavior of molecules and describe their evolution over time makes it a powerful synergistic tool for biochemistry, structural biology and other life sciences. To advance research and knowledge on reasonable timescales, researchers must optimize the amount of useful information extracted from simulation data while often frugally managing computational resources. Often, this involves balancing the length of MD trajectories with the number of replicas of a given system, with the aim of maximizing sampling of the conformational landscape. However, identifying this balance is not always intuitive, and the lack of standards among researchers can produce large variability in results and predictions from MD measurements. Here, we investigate the variability in MD results when simulation length and replica numbers are varied. Using a 231‐amino acid domain, we compare measurements from independent trajectories to a benchmark trajectory of 3, 1000‐ns replicates. We perform these simulations on 27 protein‐ligand complexes, allowing us to compare ligand‐specific rankings of complexes across independent replicas. Our results reveal that some MD measurements are accurately ranked by single trajectories, while others are not. We uncover similar variability in the effects of trajectory lengths on measurements. Our findings suggest that a one‐size‐fits‐all approach to MD simulations is not necessarily the best approach, and depending on the intended measurements and research question, it may be advantageous sometimes to prioritize longer trajectories over multiple replicas. This work provides important considerations for researchers while designing simulation studies. 
    more » « less