skip to main content

Title: Efficient Parameter Estimation for DNA Kinetics Modeled as Continuous-Time Markov Chains
Nucleic acid kinetic simulators aim to predict the kinetics of interacting nucleic acid strands. Many simulators model the kinetics of interacting nucleic acid strands as continuous-time Markov chains (CTMCs). States of the CTMCs represent a collection of secondary structures, and transitions between the states correspond to the forming or breaking of base pairs and are determined by a nucleic acid kinetic model. The number of states these CTMCs can form may be exponentially large in the length of the strands, making two important tasks challenging, namely, mean first passage time (MFPT) estimation and parameter estimation for kinetic models based on MFPTs. Gillespie’s stochastic simulation algorithm (SSA) is widely used to analyze nucleic acid folding kinetics, but could be computationally expensive for reactions whose CTMC has a large state space or for slow reactions. It could also be expensive for arbitrary parameter sets that occur in parameter estimation. Our work addresses these two challenging tasks, in the full state space of all non-pseudoknotted secondary structures of each reaction. In the first task, we show how to use a reduced variance stochastic simulation algorithm (RVSSA), which is adapted from SSA, to estimate the MFPT of a reaction’s CTMC. In the second task, more » we estimate model parameters based on MFPTs. To this end, first, we show how to use a generalized method of moments (GMM) approach, where we minimize a squared norm of moment functions that we formulate based on experimental and estimated MFPTs. Second, to speed up parameter estimation, we introduce a fixed path ensemble inference (FPEI) approach, that we adapt from RVSSA. We implement and evaluate RVSSA and FPEI using the Multistrand kinetic simulator. In our experiments on a dataset of DNA reactions, FPEI speeds up parameter estimation compared to inference using SSA, by more than a factor of three for slow reactions. Also, for reactions with large state spaces, it speeds up parameter estimation by more than a factor of two. « less
Authors:
; ; ; ; ;
Award ID(s):
1643606
Publication Date:
NSF-PAR ID:
10112046
Journal Name:
DNA Computing and Molecular Programming
Volume:
11648
Page Range or eLocation-ID:
80-99
Sponsoring Org:
National Science Foundation
More Like this
  1. Models of nucleic acid thermal stability are calibrated to a wide range of experimental observations, and typically predict equilibrium probabilities of nucleic acid secondary structures with reasonable accuracy. By comparison, a similar calibration and evaluation of nucleic acid kinetic models to a broad range of measurements has not been attempted so far. We introduce an Arrhenius model of interacting nucleic acid kinetics that relates the activation energy of a state transition with the immediate local environment of the affected base pair. Our model can be used in stochastic simulations to estimate kinetic properties and is consistent with existing thermodynamic models.more »We infer parameters for our model using an ensemble Markov chain Monte Carlo (MCMC) approach on a training dataset with 320 kinetic measurements of hairpin closing and opening, helix association and dissociation, bubble closing and toehold-mediated strand exchange. Our new model surpasses the performance of the previously established Metropolis model both on the training set and on a testing set of size 56 composed of toehold-mediated 3-way strand displacement with mismatches and hairpin opening and closing rates: reaction rates are predicted to within a factor of three for 93.4% and 78.5% of reactions for the training and testing sets, respectively.« less
  2. Abstract Although the governing equations of many systems, when derived from first principles, may be viewed as known, it is often too expensive to numerically simulate all the interactions they describe. Therefore, researchers often seek simpler descriptions that describe complex phenomena without numerically resolving all the interacting components. Stochastic differential equations (SDEs) arise naturally as models in this context. The growth in data acquisition, both through experiment and through simulations, provides an opportunity for the systematic derivation of SDE models in many disciplines. However, inconsistencies between SDEs and real data at short time scales often cause problems, when standard statisticalmore »methodology is applied to parameter estimation. The incompatibility between SDEs and real data can be addressed by deriving sufficient statistics from the time-series data and learning parameters of SDEs based on these. Here, we study sufficient statistics computed from time averages, an approach that we demonstrate to lead to sufficient statistics on a variety of problems and that has the secondary benefit of obviating the need to match trajectories. Following this approach, we formulate the fitting of SDEs to sufficient statistics from real data as an inverse problem and demonstrate that this inverse problem can be solved by using ensemble Kalman inversion. Furthermore, we create a framework for non-parametric learning of drift and diffusion terms by introducing hierarchical, refinable parameterizations of unknown functions, using Gaussian process regression. We demonstrate the proposed methodology for the fitting of SDE models, first in a simulation study with a noisy Lorenz ’63 model, and then in other applications, including dimension reduction in deterministic chaotic systems arising in the atmospheric sciences, large-scale pattern modeling in climate dynamics and simplified models for key observables arising in molecular dynamics. The results confirm that the proposed methodology provides a robust and systematic approach to fitting SDE models to real data.« less
  3. Abstract Motivation

    Advances in experimental and imaging techniques have allowed for unprecedented insights into the dynamical processes within individual cells. However, many facets of intracellular dynamics remain hidden, or can be measured only indirectly. This makes it challenging to reconstruct the regulatory networks that govern the biochemical processes underlying various cell functions. Current estimation techniques for inferring reaction rates frequently rely on marginalization over unobserved processes and states. Even in simple systems this approach can be computationally challenging, and can lead to large uncertainties and lack of robustness in parameter estimates. Therefore we will require alternative approaches to efficiently uncover themore »interactions in complex biochemical networks.

    Results

    We propose a Bayesian inference framework based on replacing uninteresting or unobserved reactions with time delays. Although the resulting models are non-Markovian, recent results on stochastic systems with random delays allow us to rigorously obtain expressions for the likelihoods of model parameters. In turn, this allows us to extend MCMC methods to efficiently estimate reaction rates, and delay distribution parameters, from single-cell assays. We illustrate the advantages, and potential pitfalls, of the approach using a birth–death model with both synthetic and experimental data, and show that we can robustly infer model parameters using a relatively small number of measurements. We demonstrate how to do so even when only the relative molecule count within the cell is measured, as in the case of fluorescence microscopy.

    Availability and implementation

    Accompanying code in R is available at https://github.com/cbskust/DDE_BD.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  4. The rapidly increasing congestion in the low Earth environment makes the modeling of uncertainty in atmospheric drag force a critical task, affecting space situational awareness (SSA) activities like the probability of collision estimation. A key element in atmospheric drag modeling is the assessment of uncertainty in the atmospheric drag coefficient estimate. While atmospheric drag coefficients for space objects with known characteristics can be computed numerically, they suffer from large computational costs for practical applications. In this work, we use cost-effective data-driven stochastic methods for modeling the drag coefficients of objects in the low Earth orbit (LEO) region. The training datamore »is generated using the numerical Test Particle Monte Carlo (TPMC) method. TPMC is simulated with Cercignani–Lampis–Lord (CLL) gas-surface interaction (GSI) model. Mehta et al. [1] use a Gaussian process regression (GPR) model to predict satellite drag coefficient, but the authors did not estimate the predictive uncertainty. The first part of this research extends the work by Mehta et al. [1] by fitting a GPR model to the training data and performing predictive uncertainty estimation. The results of the Gaussian fit are then compared against a deep neural network (DNN) model aided by the Monte Carlo dropout approach. To the best of our knowledge, this is the first study to use the aforementioned stochastic deep learning algorithm to perform predictive uncertainty estimation of the estimated satellite drag coefficient. Apart from the accuracy of the models, we also undertake the task of calibrating the models. Simulations are carried out for a spherical satellite followed by the Champ satellite. Finally, quantification of the effect of drag coefficient uncertainty on orbit prediction is carried out for different solar activity and geomagnetic activity levels.« less
  5. Abstract. Mass accommodation is an essential process for gas–particle partitioning oforganic compounds in secondary organic aerosols (SOA). The massaccommodation coefficient is commonly described as the probability of a gasmolecule colliding with the surface to enter the particle phase. It is oftenapplied, however, without specifying if and how deep a molecule has topenetrate beneath the surface to be regarded as being incorporated into thecondensed phase (adsorption vs. absorption). While this aspect is usuallynot critical for liquid particles with rapid surface–bulk exchange, it canbe important for viscous semi-solid or glassy solid particles to distinguishand resolve the kinetics of accommodation at the surface,more »transfer acrossthe gas–particle interface, and further transport into the particle bulk. For this purpose, we introduce a novel parameter: an effective massaccommodation coefficient αeff that depends on penetrationdepth and is a function of surface accommodation coefficient, volatility,bulk diffusivity, and particle-phase reaction rate coefficient. Applicationof αeff in the traditional Fuchs–Sutugin approximation ofmass-transport kinetics at the gas–particle interface yields SOApartitioning results that are consistent with a detailed kinetic multilayermodel (kinetic multilayer model of gas–particle interactions in aerosols and clouds, KM-GAP; Shiraiwa et al., 2012) and two-film model solutions (Modelfor Simulating Aerosol Interactions and Chemistry, MOSAIC;Zaveri et al., 2014) but deviate substantially from earlier modelingapproaches not considering the influence of penetration depth and relatedparameters. For highly viscous or semi-solid particles, we show that the effective massaccommodation coefficient remains similar to the surface accommodationcoefficient in the case of low-volatility compounds, whereas it can decrease byseveral orders of magnitude in the case of semi-volatile compounds. Such effectscan explain apparent inconsistencies between earlier studies deriving massaccommodation coefficients from experimental data or from molecular dynamicssimulations. Our findings challenge the approach of traditional SOA models using theFuchs–Sutugin approximation of mass transfer kinetics with a fixed massaccommodation coefficient, regardless of particle phase state and penetrationdepth. The effective mass accommodation coefficient introduced in this studyprovides an efficient new way of accounting for the influence of volatility,diffusivity, and particle-phase reactions on SOA partitioning in processmodels as well as in regional and global air quality models. While kineticlimitations may not be critical for partitioning into liquid SOA particlesin the planetary boundary layer (PBL), the effects are likely important foramorphous semi-solid or glassy SOA in the free and upper troposphere (FT–UT)as well as in the PBL at low relative humidity and low temperature.« less