skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep learning to decompose macromolecules into independent Markovian domains
Abstract The increasing interest in modeling the dynamics of ever larger proteins has revealed a fundamental problem with models that describe the molecular system as being in a global configuration state. This notion limits our ability to gather sufficient statistics of state probabilities or state-to-state transitions because for large molecular systems the number of metastable states grows exponentially with size. In this manuscript, we approach this challenge by introducing a method that combines our recent progress on independent Markov decomposition (IMD) with VAMPnets, a deep learning approach to Markov modeling. We establish a training objective that quantifies how well a given decomposition of the molecular system into independent subdomains with Markovian dynamics approximates the overall dynamics. By constructing an end-to-end learning framework, the decomposition into such subdomains and their individual Markov state models are simultaneously learned, providing a data-efficient and easily interpretable summary of the complex system dynamics. While learning the dynamical coupling between Markovian subdomains is still an open issue, the present results are a significant step towards learning Ising models of large molecular complexes from simulation data.  more » « less
Award ID(s):
2019745
PAR ID:
10380925
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper establishes Hoeffding’s lemma and inequality for bounded functions of general-state-space and not necessarily reversible Markov chains. The sharpness of these results is characterized by the optimality of the ratio between variance prox- ies in the Markov-dependent and independent settings. The boundedness of functions is shown necessary for such results to hold in general. To showcase the usefulness of the new results, we apply them for non-asymptotic analyses of MCMC estima- tion, respondent-driven sampling and high-dimensional covariance matrix estimation on time series data with a Markovian nature. In addition to statistical problems, we also apply them to study the time-discounted rewards in econometric models and the multi-armed bandit problem with Markovian rewards arising from the field of machine learning. 
    more » « less
  2. One important problem in constructing the reduced dynamics of molecular systems is the accurate modeling of the non-Markovian behavior arising from the dynamics of unresolved variables. The main complication emerges from the lack of scale separations, where the reduced dynamics generally exhibits pronounced memory and non-white noise terms. We propose a data-driven approach to learn the reduced model of multi-dimensional resolved variables that faithfully retains the non-Markovian dynamics. Different from the common approaches based on the direct construction of the memory function, the present approach seeks a set of non-Markovian features that encode the history of the resolved variables and establishes a joint learning of the extended Markovian dynamics in terms of both the resolved variables and these features. The training is based on matching the evolution of the correlation functions of the extended variables that can be directly obtained from the ones of the resolved variables. The constructed model essentially approximates the multi-dimensional generalized Langevin equation and ensures numerical stability without empirical treatment. We demonstrate the effectiveness of the method by constructing the reduced models of molecular systems in terms of both one-dimensional and four-dimensional resolved variables. 
    more » « less
  3. The path-tracking control performance of an autonomous vehicle (AV) is crucially dependent upon modeling choices and subsequent system-identification updates. Traditionally, automotive engineering has built upon increasing fidelity of white- and gray-box models coupled with system identification. While these models offer explainability, they suffer from modeling inaccuracies, non-linearities, and parameter variation. On the other end, end-to-end black-box methods like behavior cloning and reinforcement learning provide increased adaptability but at the expense of explainability, generalizability, and the sim2real gap. In this regard, hybrid data-driven techniques like Koopman Extended Dynamic Mode Decomposition (KEDMD) can achieve linear embedding of non-linear dynamics through a selection of “lifting functions”. However, the success of this method is primarily predicated on the choice of lifting function(s) and optimization parameters. In this study, we present an analytical approach to construct these lifting functions using the iterative Lie bracket vector fields considering holonomic and non-holonomic constraints on the configuration manifold of our Ackermann-steered autonomous mobile robot. The prediction and control capabilities of the obtained linear KEDMD model are showcased using trajectory tracking of standard vehicle dynamics maneuvers and along a closed-loop racetrack. 
    more » « less
  4. Inferring underlying microscopic dynamics from low-dimensional experimental signals is a central problem in physics, chemistry, and biology. As a trade-off between molecular complexity and the low-dimensional nature of experimental data, mesoscopic descriptions such as the Markovian master equation are commonly used. The states in such descriptions usually include multiple microscopic states, and the ensuing coarse-grained dynamics are generally non-Markovian. It is frequently assumed that such dynamics can nevertheless be described as a Markov process because of the timescale separation between slow transitions from one observed coarse state to another and the fast interconversion within such states. Here, we use a simple model of a molecular motor with unobserved internal states to highlight that (1) dissipation estimated from the observed coarse dynamics may significantly underestimate microscopic dissipation even in the presence of timescale separation and even when mesoscopic states do not contain dissipative cycles and (2) timescale separation is not necessarily required for the Markov approximation to give the exact entropy production, provided that certain constraints on the microscopic rates are satisfied. When the Markov approximation is inadequate, we discuss whether including memory effects can improve the estimate. Surprisingly, when we do so in a “model-free” way by computing the Kullback–Leibler divergence between the observed probability distributions of forward trajectories and their time reverses, this leads to poorer estimates of entropy production. Finally, we argue that alternative approaches, such as hidden Markov models, may uncover the dissipative nature of the microscopic dynamics even when the observed coarse trajectories are completely time-reversible. 
    more » « less
  5. MLMOD is a software package for incorporating machine learning approaches and models into simulations of microscale mechanics and molecular dynamics in LAMMPS. Recent machine learning approaches provide promising data-driven approaches for learning representations for system behaviors from experimental data and high fidelity simulations. The package facilitates learning and using data-driven models for (i) dynamics of the system at larger spatial-temporal scales (ii) interactions between system components, (iii) features yielding coarser degrees of freedom, and (iv) features for new quantities of interest characterizing system behaviors. MLMOD provides hooks in LAMMPS for (i) modeling dynamics and time-step integration, (ii) modeling interactions, and (iii) computing quantities of interest characterizing system states. The package allows for use of machine learning methods with general model classes including Neural Networks, Gaussian Process Regression, Kernel Models, and other approaches. Here we discuss our prototype C++/Python package, aims, and example usage. The package is integrated currently with the mesocale and molecular dynamics simulation package LAMMPS and PyTorch. 
    more » « less