skip to main content


Title: Data-driven coarse-grained modeling of non-equilibrium systems
Modeling a high-dimensional Hamiltonian system in reduced dimensions with respect to coarse-grained (CG) variables can greatly reduce computational cost and enable efficient bottom-up prediction of main features of the system for many applications. However, it usually experiences significantly altered dynamics due to loss of degrees of freedom upon coarse-graining. To establish CG models that can faithfully preserve dynamics, previous efforts mainly focused on equilibrium systems. In contrast, various soft matter systems are known to be out of equilibrium. Therefore, the present work concerns non-equilibrium systems and enables accurate and efficient CG modeling that preserves non-equilibrium dynamics and is generally applicable to any non-equilibrium process and any observable of interest. To this end, the dynamic equation of a CG variable is built in the form of the non-stationary generalized Langevin equation (nsGLE), where the two-time memory kernel is determined from the data of the auto-correlation function of the observable of interest. By embedding the nsGLE in an extended dynamics framework, the nsGLE can be solved efficiently to predict the non-equilibrium dynamics of the CG variable. To prove and exploit the equivalence of the nsGLE and extended dynamics, the memory kernel is parameterized in a two-time exponential expansion. A data-driven hybrid optimization process is proposed for the parameterization, which integrates the differential-evolution method with the Levenberg–Marquardt algorithm to efficiently tackle a non-convex and high-dimensional optimization problem.  more » « less
Award ID(s):
1761068
NSF-PAR ID:
10286271
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Soft Matter
Volume:
17
Issue:
26
ISSN:
1744-683X
Page Range / eLocation ID:
6404 to 6412
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We present data-driven coarse-grained (CG) modeling for polymers in solution, which conserves the dynamic as well as structural properties of the underlying atomistic system. The CG modeling is built upon the framework of the generalized Langevin equation (GLE). The key is to determine each term in the GLE by directly linking it to atomistic data. In particular, we propose a two-stage Gaussian process-based Bayesian optimization method to infer the non-Markovian memory kernel from the data of the velocity autocorrelation function (VACF). Considering that the long-time behaviors of the VACF and memory kernel for polymer solutions can exhibit hydrodynamic scaling (algebraic decay with time), we further develop an active learning method to determine the emergence of hydrodynamic scaling, which can accelerate the inference process of the memory kernel. The proposed methods do not rely on how the mean force or CG potential in the GLE is constructed. Thus, we also compare two methods for constructing the CG potential: a deep learning method and the iterative Boltzmann inversion method. With the memory kernel and CG potential determined, the GLE is mapped onto an extended Markovian process to circumvent the expensive cost of directly solving the GLE. The accuracy and computational efficiency of the proposed CG modeling are assessed in a model star-polymer solution system at three representative concentrations. By comparing with the reference atomistic simulation results, we demonstrate that the proposed CG modeling can robustly and accurately reproduce the dynamic and structural properties of polymers in solution. 
    more » « less
  2. We present a bottom-up coarse-graining (CG) method to establish implicit-solvent CG modeling for polymers in solution, which conserves the dynamic properties of the reference microscopic system. In particular, tens to hundreds of bonded polymer atoms (or Lennard-Jones beads) are coarse-grained as one CG particle, and the solvent degrees of freedom are eliminated. The dynamics of the CG system is governed by the generalized Langevin equation (GLE) derived via the Mori-Zwanzig formalism, by which the CG variables can be directly and rigorously linked to the microscopic dynamics generated by molecular dynamics (MD) simulations. The solvent-mediated dynamics of polymers is modeled by the non-Markovian stochastic dynamics in GLE, where the memory kernel can be computed from the MD trajectories. To circumvent the difficulty in direct evaluation of the memory term and generation of colored noise, we exploit the equivalence between the non-Markovian dynamics and Markovian dynamics in an extended space. To this end, the CG system is supplemented with auxiliary variables that are coupled linearly to the momentum and among themselves, subject to uncorrelated Gaussian white noise. A high-order time-integration scheme is used to solve the extended dynamics to further accelerate the CG simulations. To assess, validate, and demonstrate the established implicit-solvent CG modeling, we have applied it to study four different types of polymers in solution. The dynamic properties of polymers characterized by the velocity autocorrelation function, diffusion coefficient, and mean square displacement as functions of time are evaluated in both CG and MD simulations. Results show that the extended dynamics with auxiliary variables can construct arbitrarily high-order CG models to reproduce dynamic properties of the reference microscopic system and to characterize long-time dynamics of polymers in solution. 
    more » « less
  3. null (Ed.)
    The present work concerns the transferability of coarse-grained (CG) modeling in reproducing the dynamic properties of the reference atomistic systems across a range of parameters. In particular, we focus on implicit-solvent CG modeling of polymer solutions. The CG model is based on the generalized Langevin equation, where the memory kernel plays the critical role in determining the dynamics in all time scales. Thus, we propose methods for transfer learning of memory kernels. The key ingredient of our methods is Gaussian process regression. By integration with the model order reduction via proper orthogonal decomposition and the active learning technique, the transfer learning can be practically efficient and requires minimum training data. Through two example polymer solution systems, we demonstrate the accuracy and efficiency of the proposed transfer learning methods in the construction of transferable memory kernels. The transferability allows for out-of-sample predictions, even in the extrapolated domain of parameters. Built on the transferable memory kernels, the CG models can reproduce the dynamic properties of polymers in all time scales at different thermodynamic conditions (such as temperature and solvent viscosity) and for different systems with varying concentrations and lengths of polymers. 
    more » « less
  4. Optimization of mixing in microfluidic devices is a popular application of computational fluid dynamics software packages, such as COMSOL Multiphysics, with an increasing number of studies being published on the topic. On one hand, the laminar nature of the flow and lack of turbulence in this type of devices can enable very accurate numerical modeling of the fluid motion and reactant/particle distribution, even in complex channel geometries. On the other hand, the same laminar nature of the flow, makes mixing, which is fundamental to the functionality of any microfluidic reactor or assay system, hard to achieve, as it forces reliance on the slow molecular diffusion, rather than on turbulence. This in turn forces designers of microfluidic systems to develop a broad set of strategies to enable mixing on the microscale, targeted to the specific applications of interest. In this context, numerical modeling can enable efficient exploration of a large set of parameters affecting mixing, such as geometrical characteristics and flow rates, to identify optimal designs. However, it has to be noted that even very performant mixing topologies, such as the use of groove-ridge surface features, require multiple mixing units. This in turn requires very high resolution meshing, in particular when looking for solutions for the convection-diffusion equation governing the reactant or chemical species distribution. For the typical length of microfluidic mixing channels, analyzed using finite element analysis, this becomes computationally challenging due to the large number of elements that need to be handled. In this work we describe a methodology using the COMSOL Computational Fluid Dynamics and Chemical Reaction Engineering modules, in which large geometries are split in subunits. The Navier-Stokes and convection-diffusion equations, are then solved in each subunit separately, with the solutions obtained being transferred between them to map the flow field and concentration through the entire geometry of the channel. As validation, the model is tested against data from mixers using periodic systems of groove-ridge features in order to engineer transversal mixing flows, showing a high degree of correlation with the experimental results. It is also shown that the methodology can be extended to long mixing channels that lack periodicity and in which each geometrical mixing subunit is distinct. 
    more » « less
  5. The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge, referring to the discrepancy between an assumed truth model and the imperfect mechanistic model as model error. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T T , the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error, assuming that it is governed by a finite-dimensional hidden variable and that, together, the observed and hidden variables form a continuous-time Markovian system. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz ’63, Lorenz ’96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less datahungry and more parametrically efficient. We also find that, while a continuous-time framing allows for robustness to irregular sampling and desirable domain- interpretability, a discrete-time framing can provide similar or better predictive performance, especially when data are undersampled and the vector field defining the true dynamics cannot be identified. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models. 
    more » « less