skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 9:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

Title: Learning molecular dynamics with simple language model built upon long short-term memory neural network
Abstract Recurrent neural networks have led to breakthroughs in natural language processing and speech recognition. Here we show that recurrent networks, specifically long short-term memory networks can also capture the temporal evolution of chemical/biophysical trajectories. Our character-level language model learns a probabilistic model of 1-dimensional stochastic trajectories generated from higher-dimensional dynamics. The model captures Boltzmann statistics and also reproduces kinetics across a spectrum of timescales. We demonstrate how training the long short-term memory network is equivalent to learning a path entropy, and that its embedding layer, instead of representing contextual meaning of characters, here exhibits a nontrivial connectivity between different metastable states in the underlying physical system. We demonstrate our model’s reliability through different benchmark systems and a force spectroscopy trajectory for multi-state riboswitch. We anticipate that our work represents a stepping stone in the understanding and use of recurrent neural networks for understanding the dynamics of complex stochastic molecular systems.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Nature Communications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collected from different application domains. These include classical Molecular Dynamics of a protein and Monte Carlo simulations of an open quantum system continuously losing photons to the environment and displaying Rabi oscillations. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections. 
    more » « less
  2. For emerging edge and near-sensor systems to perform hard classification tasks locally, they must avoid costly communication with the cloud. This requires the use of compact classifiers such as recurrent neural networks of the long short term memory (LSTM) type, as well as a low-area hardware technology such as stochastic computing (SC). We study the benefits and costs of applying SC to LSTM design. We consider a design space spanned by fully binary (non-stochastic), fully stochastic, and several hybrid (mixed) LSTM architectures, and design and simulate examples of each. Using standard classification benchmarks, we show that area and power can be reduced up to 47% and 86% respectively with little or no impact on classification accuracy. We demonstrate that fully stochastic LSTMs can deliver acceptable accuracy despite accumulated errors. Our results also suggest that ReLU is preferable to tanh as an activation function in stochastic LSTMs 
    more » « less
  3. In this study, we predicted the log returns of the top 10 cryptocurrencies based on market cap, using univariate and multivariate machine learning methods such as recurrent neural networks, deep learning neural networks, Holt’s exponential smoothing, autoregressive integrated moving average, ForecastX, and long short-term memory networks. The multivariate long short-term memory networks performed better than the univariate machine learning methods in terms of the prediction error measures. 
    more » « less
  4. We developed an integrated recurrent neural network and nonlinear regression spatio-temporal model for vector-borne disease evolution. We take into account climate data and seasonality as external factors that correlate with disease transmitting insects (e.g. flies), also spill-over infections from neighboring regions surrounding a region of interest. The climate data is encoded to the model through a quadratic embedding scheme motivated by recommendation systems. The neighboring regions’ influence is modeled by a long short-term memory neural network. The integrated model is trained by stochastic gradient descent and tested on leishmaniasis data in Sri Lanka from 2013-2018 where infection outbreaks occurred. Our model out-performed ARIMA models across a number of regions with high infections, and an associated ablation study renders support to our modeling hypothesis and ideas. 
    more » « less
  5. To predict rare extreme events using deep neural networks, one encounters the so-called small data problem because even long-term observations often contain few extreme events. Here, we investigate a model-assisted framework where the training data are obtained from numerical simulations, as opposed to observations, with adequate samples from extreme events. However, to ensure the trained networks are applicable in practice, the training is not performed on the full simulation data; instead, we only use a small subset of observable quantities, which can be measured in practice. We investigate the feasibility of this model-assisted framework on three different dynamical systems (Rössler attractor, FitzHugh–Nagumo model, and a turbulent fluid flow) and three different deep neural network architectures (feedforward, long short-term memory, and reservoir computing). In each case, we study the prediction accuracy, robustness to noise, reproducibility under repeated training, and sensitivity to the type of input data. In particular, we find long short-term memory networks to be most robust to noise and to yield relatively accurate predictions, while requiring minimal fine-tuning of the hyperparameters.

    more » « less