skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Data-driven construction of stochastic reduced dynamics encoded with non-Markovian features
One important problem in constructing the reduced dynamics of molecular systems is the accurate modeling of the non-Markovian behavior arising from the dynamics of unresolved variables. The main complication emerges from the lack of scale separations, where the reduced dynamics generally exhibits pronounced memory and non-white noise terms. We propose a data-driven approach to learn the reduced model of multi-dimensional resolved variables that faithfully retains the non-Markovian dynamics. Different from the common approaches based on the direct construction of the memory function, the present approach seeks a set of non-Markovian features that encode the history of the resolved variables and establishes a joint learning of the extended Markovian dynamics in terms of both the resolved variables and these features. The training is based on matching the evolution of the correlation functions of the extended variables that can be directly obtained from the ones of the resolved variables. The constructed model essentially approximates the multi-dimensional generalized Langevin equation and ensures numerical stability without empirical treatment. We demonstrate the effectiveness of the method by constructing the reduced models of molecular systems in terms of both one-dimensional and four-dimensional resolved variables.  more » « less
Award ID(s):
2110981
PAR ID:
10416722
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
158
Issue:
3
ISSN:
0021-9606
Page Range / eLocation ID:
034102
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a unified framework for the data-driven construction of stochastic reduced models with state-dependent memory for high-dimensional Hamiltonian systems. The method addresses two key challenges: (i) accurately modeling heterogeneous non-Markovian effects where the memory function depends on the coarse-grained (CG) variables beyond the standard homogeneous kernel, and (ii) efficiently exploring the phase space to sample both equilibrium and dynamical observables for reduced model construction. Specifically, we employ a consensus-based sampling method to establish a shared sampling strategy that enables simultaneous construction of the free energy function and collection of conditional two-point correlation functions used to learn the state-dependent memory. The reduced dynamics is formulated as an extended Markovian system, where a set of auxiliary variables, interpreted as non-Markovian features, is jointly learned to systematically approximate the memory function using only two-point statistics. The constructed model yields a generalized Langevin-type formulation with an invariant distribution consistent with the full dynamics. We demonstrate the effectiveness of the proposed framework on a two-dimensional CG model of an alanine dipeptide molecule. Numerical results on the transition dynamics between metastable states show that accurately capturing state-dependent memory is essential for predicting non-equilibrium kinetic properties, whereas the standard generalized Langevin model with a homogeneous kernel exhibits significant discrepancies. 
    more » « less
  2. We present a bottom-up coarse-graining (CG) method to establish implicit-solvent CG modeling for polymers in solution, which conserves the dynamic properties of the reference microscopic system. In particular, tens to hundreds of bonded polymer atoms (or Lennard-Jones beads) are coarse-grained as one CG particle, and the solvent degrees of freedom are eliminated. The dynamics of the CG system is governed by the generalized Langevin equation (GLE) derived via the Mori-Zwanzig formalism, by which the CG variables can be directly and rigorously linked to the microscopic dynamics generated by molecular dynamics (MD) simulations. The solvent-mediated dynamics of polymers is modeled by the non-Markovian stochastic dynamics in GLE, where the memory kernel can be computed from the MD trajectories. To circumvent the difficulty in direct evaluation of the memory term and generation of colored noise, we exploit the equivalence between the non-Markovian dynamics and Markovian dynamics in an extended space. To this end, the CG system is supplemented with auxiliary variables that are coupled linearly to the momentum and among themselves, subject to uncorrelated Gaussian white noise. A high-order time-integration scheme is used to solve the extended dynamics to further accelerate the CG simulations. To assess, validate, and demonstrate the established implicit-solvent CG modeling, we have applied it to study four different types of polymers in solution. The dynamic properties of polymers characterized by the velocity autocorrelation function, diffusion coefficient, and mean square displacement as functions of time are evaluated in both CG and MD simulations. Results show that the extended dynamics with auxiliary variables can construct arbitrarily high-order CG models to reproduce dynamic properties of the reference microscopic system and to characterize long-time dynamics of polymers in solution. 
    more » « less
  3. The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge, referring to the discrepancy between an assumed truth model and the imperfect mechanistic model as model error. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T T , the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error, assuming that it is governed by a finite-dimensional hidden variable and that, together, the observed and hidden variables form a continuous-time Markovian system. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz ’63, Lorenz ’96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less datahungry and more parametrically efficient. We also find that, while a continuous-time framing allows for robustness to irregular sampling and desirable domain- interpretability, a discrete-time framing can provide similar or better predictive performance, especially when data are undersampled and the vector field defining the true dynamics cannot be identified. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models. 
    more » « less
  4. null (Ed.)
    Modeling a high-dimensional Hamiltonian system in reduced dimensions with respect to coarse-grained (CG) variables can greatly reduce computational cost and enable efficient bottom-up prediction of main features of the system for many applications. However, it usually experiences significantly altered dynamics due to loss of degrees of freedom upon coarse-graining. To establish CG models that can faithfully preserve dynamics, previous efforts mainly focused on equilibrium systems. In contrast, various soft matter systems are known to be out of equilibrium. Therefore, the present work concerns non-equilibrium systems and enables accurate and efficient CG modeling that preserves non-equilibrium dynamics and is generally applicable to any non-equilibrium process and any observable of interest. To this end, the dynamic equation of a CG variable is built in the form of the non-stationary generalized Langevin equation (nsGLE), where the two-time memory kernel is determined from the data of the auto-correlation function of the observable of interest. By embedding the nsGLE in an extended dynamics framework, the nsGLE can be solved efficiently to predict the non-equilibrium dynamics of the CG variable. To prove and exploit the equivalence of the nsGLE and extended dynamics, the memory kernel is parameterized in a two-time exponential expansion. A data-driven hybrid optimization process is proposed for the parameterization, which integrates the differential-evolution method with the Levenberg–Marquardt algorithm to efficiently tackle a non-convex and high-dimensional optimization problem. 
    more » « less
  5. One essential goal of constructing coarse-grained molecular dynamics (CGMD) models is to accurately predict nonequilibrium processes beyond the atomistic scale. While a CG model can be constructed by projecting the full dynamics onto a set of resolved variables, the dynamics of the CG variables can recover the full dynamics only when the conditional distribution of the unresolved variables is close to the one associated with the particular projection operator. In particular, the model's applicability to various nonequilibrium processes is generally unwarranted due to the inconsistency in the conditional distribution. Here, we present a data-driven approach for constructing CGMD models that retain certain generalization ability for nonequilibrium processes. Unlike the conventional CG models based on preselected CG variables (e.g., the center of mass), the present CG model seeks a set of auxiliary CG variables similar to the time-lagged independent component analysis to maximize the velocity correlation. This effectively minimizes the entropy contribution of unresolved variables and ensures the distribution under a broad range of nonequilibrium conditions approaches the one under equilibrium. Numerical results of a polymer melt system demonstrate the significance of this broadly overlooked metric for the model's generalization ability, and the effectiveness of the present CG model for predicting the complex viscoelastic responses under various nonequilibrium flows. 
    more » « less