skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The fast committor machine: Interpretable prediction with kernels
In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration x will reach a set B before a set A. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the “fast committor machine” (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces that optimally describe the A to B transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly with the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.  more » « less
Award ID(s):
2111278 2111277
PAR ID:
10537756
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
AIP
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
161
Issue:
8
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
    We consider the problem of learning a single-index target function f∗ : Rd → R under the spiked covariance data: f∗(x) = σ∗   √ 1 1+θ ⟨x,μ⟩   , x ∼ N(0, Id + θμμ⊤), θ ≍ dβ for β ∈ [0, 1), where the link function σ∗ : R → R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ∗), and it depends on the projection of input x onto the spike (signal) direction μ ∈ Rd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d → ∞, n/d → ψ ∈ (0,∞), we ask the following question: how large should the spike magnitude θ be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f∗? We show that for kernel ridge regression, β ≥ 1 − 1 p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, β > 1 − 1 k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k ≤ p by definition, neural networks can adapt to such structures more effectively. 
    more » « less
  2. null (Ed.)
    We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rank-one data matrices, we prove that finite two-layer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto encoders. In higher dimensions, we show that the training problem for two-layer networks can be cast as a finite dimensional convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cutting-plane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to ℓ0-ℓ1 equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models. 
    more » « less
  3. The study of phenomena such as protein folding and conformational changes in molecules is a central theme in chemical physics. Molecular dynamics (MD) simulation is the primary tool for the study of transition processes in biomolecules, but it is hampered by a huge timescale gap between the processes of interest and atomic vibrations that dictate the time step size. Therefore, it is imperative to combine MD simulations with other techniques in order to quantify the transition processes taking place on large timescales. In this work, the diffusion map with Mahalanobis kernel, a meshless approach for approximating the Backward Kolmogorov Operator (BKO) in collective variables, is upgraded to incorporate standard enhanced sampling techniques, such as metadynamics. The resulting algorithm, which we call the target measure Mahalanobis diffusion map (tm-mmap), is suitable for a moderate number of collective variables in which one can approximate the diffusion tensor and free energy. Imposing appropriate boundary conditions allows use of the approximated BKO to solve for the committor function and utilization of transition path theory to find the reactive current delineating the transition channels and the transition rate. The proposed algorithm, tm-mmap, is tested on the two-dimensional Moro–Cardin two-well system with position-dependent diffusion coefficient and on alanine dipeptide in two collective variables where the committor, the reactive current, and the transition rate are compared to those computed by the finite element method (FEM). Finally, tm-mmap is applied to alanine dipeptide in four collective variables where the use of finite elements is infeasible. 
    more » « less
  4. Many processes in nature such as conformal changes in biomolecules and clusters of interacting particles, genetic switches, mechanical or electromechanical oscillators with added noise, and many others are modeled using stochastic differential equations with small white noise. The study of rare transitions between metastable states in such systems is of great interest and importance. The direct simulation of rare transitions is difficult due to long waiting times. Transition path theory is a mathematical framework for the quantitative description of rare events. Its crucial component is the committor function, the solution to a boundary value problem for the backward Kolmogorov equation. The key fact exploited in this work is that the optimal controller constructed from the committor leads to the generation of transition trajectories exclusively. We prove this fact for a broad class of stochastic differential equations. Moreover, we demonstrate that the committor computed for a dimensionally reduced system and then lifted to the original phase space still allows us to construct an effective controller and estimate the transition rate with reasonable accuracy. Furthermore, we propose an all-the- way-through scheme for computing the committor via neural networks, sampling the transition trajectories, and estimating the transition rate without meshing the space. We apply the proposed methodology to four test problems: the overdamped Langevin dynamics with Mueller’s potential and the rugged Mueller potential in 10D, the noisy bistable Duffing oscillator, and Lennard-Jones-7 in 2D. 
    more » « less
  5. Despite the recent popularity of attention-based neural architectures in core AI fields like natural language processing (NLP) and computer vision (CV), their potential in modeling complex physical systems remains underexplored. Learning problems in physical systems are often characterized as discovering operators that map between function spaces based on a few instances of function pairs. This task frequently presents a severely ill-posed PDE inverse problem. In this work, we propose a novel neural operator architecture based on the attention mechanism, which we refer to as the Nonlocal Attention Operator (NAO), and explore its capability in developing a foundation physical model. In particular, we show that the attention mechanism is equivalent to a double integral operator that enables nonlocal interactions among spatial tokens, with a data-dependent kernel characterizing the inverse mapping from data to the hidden parameter field of the underlying operator. As such, the attention mechanism extracts global prior information from training data generated by multiple systems, and suggests the exploratory space in the form of a nonlinear kernel map. Consequently, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability. We empirically demonstrate the advantages of NAO over baseline neural models in terms of generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning interpretable foundation models of physical systems, but also offers a new perspective towards understanding the attention mechanism. Our code and data accompanying this paper are available at https://github.com/fishmoon1234/NAO. 
    more » « less