skip to main content

Title: Data‐driven Evolution Equation Reconstruction for Parameter‐Dependent Nonlinear Dynamical Systems

When studying observations of chemical reaction dynamics, closed form equations based on a putative mechanism may not be available. Yet when sufficient data from experimental observations can be obtained, even without knowing what exactly the physical meaning of the parameter settings or recorded variables are, data‐driven methods can be used to construct minimal (and in a sense, robust) realizations of the system. The approach attempts, in a sense, to circumvent physical understanding, by building intrinsic “information geometries” of the observed data, and thus enabling prediction without physical/chemical knowledge. Here we use such an approach to obtain evolution equationsfor a data‐driven realization of the original system– in effect, allowing prediction based on the informed interrogation of the agnostically organized observation database. We illustrate the approach on observations of (a) the normal form for the cusp singularity, (b) a cusp singularity for the nonisothermal CSTR, and (c) a random invertible transformation of the nonisothermal CSTR, showing that one can predict even when the observables are not “simply explainable” physical quantities. We discuss current limitations and possible extensions of the procedure.

 ;  ;  ;  
Publication Date:
Journal Name:
Israel Journal of Chemistry
Page Range or eLocation-ID:
p. 787-794
Wiley Blackwell (John Wiley & Sons)
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Monod and Logistic growth models have been widely used as basic equations to describe cell growth in bioprocess engineering. In the case of the Monod equation, the specific growth rate is governed by a limiting nutrient, with the mathematical form similar to the Michaelis–Menten equation. In the case of the Logistic equation, the specific growth rate is determined by the carrying capacity of the system, which could be growth‐inhibiting factors (i.e., toxic chemical accumulation) other than the nutrient level. Both equations have been found valuable to guide us build unstructured kinetic models to analyze the fermentation process and understand cell physiology. In this work, we present a hybrid Logistic‐Monod growth model, which accounts for multiple growth‐dependent factors including both the limiting nutrient and the carrying capacity of the system. Coupled with substrate consumption and yield coefficient, we present the analytical solutions for this hybrid Logistic‐Monod model in both batch and continuous stirred tank reactor (CSTR) culture. Under high biomass yield (Yx/s) conditions, the analytical solution for this hybrid model is approaching to the Logistic equation; under low biomass yield condition, the analytical solution for this hybrid model converges to the Monod equation. This hybrid Logistic‐Monod equation represents the cell growth transitionmore »from substrate‐limiting condition to growth‐inhibiting condition, which could be adopted to accurately describe the multi‐phases of cell growth and may facilitate kinetic model construction, bioprocess optimization, and scale‐up in industrial biotechnology.

    « less
  2. Data from the cellular network have been proved as one of the most promising way to understand large-scale human mobility for various ubiquitous computing applications due to the high penetration of cellphones and low collection cost. Existing mobility models driven by cellular network data suffer from sparse spatial-temporal observations because user locations are recorded with cellphone activities, e.g., calls, text, or internet access. In this paper, we design a human mobility recovery system called CellSense to take the sparse cellular billing data (CBR) as input and outputs dense continuous records to recover the sensing gap when using cellular networks as sensing systems to sense the human mobility. There is limited work on this kind of recovery systems at large scale because even though it is straightforward to design a recovery system based on regression models, it is very challenging to evaluate these models at large scale due to the lack of the ground truth data. In this paper, we explore a new opportunity based on the upgrade of cellular infrastructures to obtain cellular network signaling data as the ground truth data, which log the interaction between cellphones and cellular towers at signal levels (e.g., attaching, detaching, paging) even without billablemore »activities. Based on the signaling data, we design a system CellSense for human mobility recovery by integrating collective mobility patterns with individual mobility modeling, which achieves the 35.3% improvement over the state-of-the-art models. The key application of our recovery model is to take regular sparse CBR data that a researcher already has, and to recover the missing data due to sensing gaps of CBR data to produce a dense cellular data for them to train a machine learning model for their use cases, e.g., next location prediction.« less
  3. Abstract

    Predictions of hydrologic variables across the entire water cycle have significant value for water resources management as well as downstream applications such as ecosystem and water quality modeling. Recently, purely data‐driven deep learning models like long short‐term memory (LSTM) showed seemingly insurmountable performance in modeling rainfall runoff and other geoscientific variables, yet they cannot predict untrained physical variables and remain challenging to interpret. Here, we show that differentiable, learnable, process‐based models (calledδmodels here) can approach the performance level of LSTM for the intensively observed variable (streamflow) with regionalized parameterization. We use a simple hydrologic model HBV as the backbone and use embedded neural networks, which can only be trained in a differentiable programming framework, to parameterize, enhance, or replace the process‐based model's modules. Without using an ensemble or post‐processor,δmodels can obtain a median Nash‐Sutcliffe efficiency of 0.732 for 671 basins across the USA for the Daymet forcing data set, compared to 0.748 from a state‐of‐the‐art LSTM model with the same setup. For another forcing data set, the difference is even smaller: 0.715 versus 0.722. Meanwhile, the resulting learnable process‐based models can output a full set of untrained variables, for example, soil and groundwater storage, snowpack, evapotranspiration, and baseflow, andmore »can later be constrained by their observations. Both simulated evapotranspiration and fraction of discharge from baseflow agreed decently with alternative estimates. The general framework can work with models with various process complexity and opens up the path for learning physics from big data.

    « less
  4. Abstract

    We model lower band chorus observations from the DEMETER satellite using daily and hourly autoregressive‐moving average transfer function (ARMAX) equations. ARMAX models can account for serial autocorrelation between observations that are measured close together in time and can be used to predict a response variable based on its past behavior without the need for recent data. Unstable distributions of radiation belt source electrons (tens of keV) and the substorm activity (SMEd from the SuperMAG array) that is thought to inject these electrons were both statistically significant explanatory variables in a daily ARMAX model describing chorus. Predictions from this model correlated well with observations in a hold‐out test data set (validation correlation of 0.675). Source electron flux was most influential when observations came from the same day or the day before the chorus measurement, with effects decaying rapidly over time. Substorms were more influential when they occurred on previous days, presumably due to their injecting source electrons from the plasma sheet. A daily ARMAX model with interplanetary magnetic field (IMF)|B|, IMFBz, and solar wind pressure as inputs instead of those given above was somewhat less predictive of chorus (r=0.611). An hourly ARMAX model with only solar wind and IMF inputsmore »was even less successful, with a validation correlation of 0.502.

    « less
  5. Abstract

    There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human–machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to learn Green’s functions of hidden linear partial differential equations. These functions reveal human-understandable properties and features, such as linear conservation laws and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate the technique on several examples and capture a range of physics, including advection–diffusion, viscous shocks, and Stokes flow in a lid-driven cavity.