skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Computing transition path theory quantities with trajectory stratification
Transition path theory computes statistics from ensembles of reactive trajectories. A common strategy for sampling reactive trajectories is to control the branching and pruning of trajectories so as to enhance the sampling of low probability segments. However, it can be challenging to apply transition path theory to data from such methods because determining whether configurations and trajectory segments are part of reactive trajectories requires looking backward and forward in time. Here, we show how this issue can be overcome efficiently by introducing simple data structures. We illustrate the approach in the context of nonequilibrium umbrella sampling, but the strategy is general and can be used to obtain transition path theory statistics from other methods that sample segments of unbiased trajectories.  more » « less
Award ID(s):
2054306
PAR ID:
10444865
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
157
Issue:
3
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An issue for molecular dynamics simulations is that events of interest often involve timescales that are much longer than the simulation time step, which is set by the fastest timescales of the model. Because of this timescale separation, direct simulation of many events is prohibitively computationally costly. This issue can be overcome by aggregating information from many relatively short simulations that sample segments of trajectories involving events of interest. This is the strategy of Markov state models (MSMs) and related approaches, but such methods suffer from approximation error because the variables defining the states generally do not capture the dynamics fully. By contrast, once converged, the weighted ensemble (WE) method aggregates information from trajectory segments so as to yield unbiased estimates of both thermodynamic and kinetic statistics. Unfortunately, errors decay no faster than unbiased simulation in WE as originally formulated and commonly deployed. Here, we introduce a theoretical framework for describing WE that shows that the introduction of an approximate stationary distribution on top of the stratification, as in nonequilibrium umbrella sampling (NEUS), accelerates convergence. Building on ideas from MSMs and related methods, we generalize the NEUS approach in such a way that the approximation error can be reduced systematically. We show that the improved algorithm can decrease the simulation time required to achieve the desired precision by orders of magnitude. 
    more » « less
  2. Abstract In this note, we apply transition path theory (TPT) from Markov chains to shed light on the problem of Iceland–Scotland Overflow Water (ISOW) equatorward export. A recent analysis of observed trajectories of submerged floats demanded revision of the traditional abyssal circulation theory, which postulates that ISOW should steadily flow along a deep boundary current (DBC) around the subpolar North Atlantic prior to exiting it. The TPT analyses carried out here allow attention to be focused on the portions of flow from the origin of ISOW to the region where ISOW exits the subpolar North Atlantic and suggest that insufficient sampling may be biasing the aforementioned demand. The analyses, appropriately adapted to represent a continuous input of ISOW, are carried out on three time-homogeneous Markov chains modeling the ISOW flow. One is constructed using a high number of simulated trajectories homogeneously covering the flow domain. The other two use much fewer trajectories which heterogeneously cover the domain. The trajectories in the latter two chains are observed trajectories or simulated trajectories subsampled at the observed frequency. While the densely sampled chain supports a well-defined DBC, whether this is a peculiarity of the simulation considered or not, the more heterogeneously sampled chains do not, irrespective of the nature of the trajectories used, i.e., observed or simulated. Studying the sampling sensitivity of the Markov chains, we can give recommendations for enlarging the existing float dataset to improve the significance of conclusions about long-time-asymptotic aspects of the ISOW circulation. 
    more » « less
  3. Transition path theory provides a statistical description of the dynamics of a reaction in terms of local spatial quantities. In its original formulation, it is limited to reactions that consist of trajectories flowing from a reactant set A to a product set B. We extend the basic concepts and principles of transition path theory to reactions in which trajectories exhibit a specified sequence of events and illustrate the utility of this generalization on examples. 
    more » « less
  4. Selection bias is inevitable in manually curated computational reaction databases but can have a significant impact on generalizability of quantum chemical methods and machine learning models derived from these data sets. Here, we propose quasireaction subgraphs as a discrete, graph-based representation of reaction mechanisms that has a well-defined associated probability space and admits a similarity function using graph kernels. Quasireaction subgraphs are thus well suited for constructing representative or diverse data sets of reactions. Quasireaction subgraphs are defined as subgraphs of a network of formal bond breaks and bond formations (transition network) composed of all shortest paths between reactant and product nodes. However, due to their purely geometric construction, they do not guarantee that the corresponding reaction mechanisms are thermodynamically and kinetically feasible. As a result, a binary classification of feasible (reaction subgraphs) and infeasible (non-reactive subgraphs) must be applied after sampling. In this paper, we describe the construction and properties of quasireaction subgraphs and characterize the statistics of quasireaction subgraphs from CHO transition networks with up to six nonhydrogen atoms. We explore their clustering using Weisfeiler–Lehman graph kernels. 
    more » « less
  5. The ability to distinguish between stochastic systems based on their trajectories is crucial in thermodynamics, chemistry, and biophysics. The Kullback–Leibler (KL) divergence, DKLAB(0,τ), quantifies the distinguishability between the two ensembles of length-τ trajectories from Markov processes A and B. However, evaluating DKLAB(0,τ) from histograms of trajectories faces sufficient sampling difficulties, and no theory explicitly reveals what dynamical features contribute to the distinguishability. This work provides a general formula that decomposes DKLAB(0,τ) in space and time for any Markov processes, arbitrarily far from equilibrium or steady state. It circumvents the sampling difficulty of evaluating DKLAB(0,τ). Furthermore, it explicitly connects trajectory KL divergence with individual transition events and their waiting time statistics. The results provide insights into understanding distinguishability between Markov processes, leading to new theoretical frameworks for designing biological sensors and optimizing signal transduction. 
    more » « less