We present a consensus-based framework that unifies phase space exploration with posterior-residual-based adaptive sampling for surrogate construction in high-dimensional energy landscapes. Unlike standard approximation tasks where sampling points can be freely queried, systems with complex energy landscapes such as molecular dynamics (MD) do not have direct access to arbitrary sampling regions due to the physical constraints and energy barriers; the surrogate construction further relies on the dynamical exploration of phase space, posing a significant numerical challenge. We formulate the problem as a minimax optimization that jointly adapts both the surrogate approximation and residual-enhanced sampling. The construction of free energy surfaces (FESs) for high-dimensional collective variables (CVs) of MD systems is used as a motivating example to illustrate the essential idea. Specifically, the maximization step establishes a stochastic interacting particle system to impose adaptive sampling through both exploitation of a Laplace approximation of the max-residual region and exploration of uncharted phase space via temperature control. The minimization step updates the FES surrogate with the new sample set. Numerical results demonstrate the effectiveness of the present approach for biomolecular systems with up to 30 CVs. While we focus on the FES construction, the developed framework is general for efficient surrogate construction for complex systems with high-dimensional energy landscapes.
more »
« less
This content will become publicly available on May 16, 2026
Generative Model-based Collective Variable Learning with Metastable State Identification in Molecular Dynamics
We propose a generative model-based framework for learning collective variables (CVs) that faithfully capture the individual metastable states of the fulldimensional molecular dynamics (MD) systems. Unlike most existing approaches based on various feature extraction strategies, the new framework transfers the exhausting efforts of feature selection into a generative task of reconstructing the full-dimensional probability density function (PDF) from a set of CVs under a prior distribution with pre-assigned local maxima. By pairing the CVs with a set of auxiliary Gaussian random variables, we seek an invertible mapping that recovers the full-dimensional PDF and meanwhile, preserves the correspondence between the metastable states of the MD space and individual local maxima of the prior distribution. Through identifying the metastable states within MD space that are generally unknown and imposing the correspondence between the two spaces, the constructed CVs retain clear physical interpretations and provide kinetic insight for the molecular systems on the collective scale. We demonstrate the effectiveness of the proposed method with the alanine dipeptide in the aqueous environment. The constructed CVs faithfully capture the essential metastable states of the full MD systems, which show good agreement with kinetic properties such as the transition from the ballistic to the plateau regime for the mean square displacement.
more »
« less
- PAR ID:
- 10610686
- Publisher / Repository:
- arxiv
- Date Published:
- Journal Name:
- arXivorg
- ISSN:
- 2331-8422
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Small integration time steps limit molecular dynamics (MD) simulations to millisecond time scales. Markov state models (MSMs) and equation-free approaches learn low-dimensional kinetic models from MD simulation data by performing configurational or dynamical coarse-graining of the state space. The learned kinetic models enable the efficient generation of dynamical trajectories over vastly longer time scales than are accessible by MD, but the discretization of configurational space and/or absence of a means to reconstruct molecular configurations precludes the generation of continuous all-atom molecular trajectories. We propose latent space simulators (LSS) to learn kinetic models for continuous all-atom simulation trajectories by training three deep learning networks to (i) learn the slow collective variables of the molecular system, (ii) propagate the system dynamics within this slow latent space, and (iii) generatively reconstruct molecular configurations. We demonstrate the approach in an application to Trp-cage miniprotein to produce novel ultra-long synthetic folding trajectories that accurately reproduce all-atom molecular structure, thermodynamics, and kinetics at six orders of magnitude lower cost than MD. The dramatically lower cost of trajectory generation enables greatly improved sampling and greatly reduced statistical uncertainties in estimated thermodynamic averages and kinetic rates.more » « less
-
Transition path theory (TPT) offers a powerful formalism for extracting the rate and mechanism of rare dynamical transitions between metastable states. Most applications of TPT either focus on systems with modestly sized state spaces or use collective variables to try to tame the curse of dimensionality. Increasingly, expressive function approximators such as neural networks and tensor networks have shown promise in computing the central object of TPT, the committor function, even in very high-dimensional systems. That progress prompts our consideration of how one could use such a high-dimensional function to extract mechanistic insights. Here, we present and illustrate a straightforward but powerful way to track how individual dynamical coordinates evolve during a reactive event. The strategy, which involves marginalizing the reactive ensemble, naturally captures the evolution of the dynamical coordinate’s distribution, not just its mean reactive behavior.more » « less
-
Predicting rare DNA conformations via dynamical graphical models: a case study of the B→A transitionAbstract DNA exhibits local conformational preferences that affect its ability to adopt biologically relevant conformations, such as those required for binding proteins. Traditional methods, like Markov state models and molecular dynamics (MD) simulations, have advanced our understanding but often struggle to capture these rare conformational states due to high computational demands. Here, we introduce a novel AI framework based on dynamical graphical models (DGMs), a generative machine learning approach trained on equilibrium MD data, to predict DNA conformational transitions that are never seen in the MD ensembles. By leveraging local DNA interactions, DGMs generate a comprehensive transition matrix that captures both thermodynamic and kinetic properties of unsampled states, enabling accurate predictions of rare global conformations without the need for extensive sampling. Applying this model to the B→A transition, we demonstrate that DGMs can efficiently predict sequence-dependent A-DNA preferences, achieving results that align closely with replica exchange umbrella sampling simulations. DGMs provide new insights into DNA sequence–structure relationships, paving the way for applications in DNA sequence design and optimization.more » « less
-
We present a unified framework for the data-driven construction of stochastic reduced models with state-dependent memory for high-dimensional Hamiltonian systems. The method addresses two key challenges: (i) accurately modeling heterogeneous non-Markovian effects where the memory function depends on the coarse-grained (CG) variables beyond the standard homogeneous kernel, and (ii) efficiently exploring the phase space to sample both equilibrium and dynamical observables for reduced model construction. Specifically, we employ a consensus-based sampling method to establish a shared sampling strategy that enables simultaneous construction of the free energy function and collection of conditional two-point correlation functions used to learn the state-dependent memory. The reduced dynamics is formulated as an extended Markovian system, where a set of auxiliary variables, interpreted as non-Markovian features, is jointly learned to systematically approximate the memory function using only two-point statistics. The constructed model yields a generalized Langevin-type formulation with an invariant distribution consistent with the full dynamics. We demonstrate the effectiveness of the proposed framework on a two-dimensional CG model of an alanine dipeptide molecule. Numerical results on the transition dynamics between metastable states show that accurately capturing state-dependent memory is essential for predicting non-equilibrium kinetic properties, whereas the standard generalized Langevin model with a homogeneous kernel exhibits significant discrepancies.more » « less
An official website of the United States government
