skip to main content

This content will become publicly available on May 1, 2023

Title: Empirical mode modeling: A data-driven approach to recover and forecast nonlinear dynamics from noisy data
Abstract Data-driven, model-free analytics are natural choices for discovery and forecasting of complex, nonlinear systems. Methods that operate in the system state-space require either an explicit multidimensional state-space, or, one approximated from available observations. Since observational data are frequently sampled with noise, it is possible that noise can corrupt the state-space representation degrading analytical performance. Here, we evaluate the synthesis of empirical mode decomposition with empirical dynamic modeling, which we term empirical mode modeling, to increase the information content of state-space representations in the presence of noise. Evaluation of a mathematical, and, an ecologically important geophysical application across three different state-space representations suggests that empirical mode modeling may be a useful technique for data-driven, model-free, state-space analysis in the presence of noise.
; ; ; ;
Award ID(s):
1660584 1655203
Publication Date:
Journal Name:
Nonlinear Dynamics
Page Range or eLocation-ID:
2147 to 2160
Sponsoring Org:
National Science Foundation
More Like this
  1. The Sun emits a stream of charged particles called the solar wind, which is the primary driver of space weather and geomagnetic disturbances. Modeling and observations complement each other to help us identify and understand the physical processes governing the solar wind dynamics on different scales. Numerical models of the solar wind have greatly improved in recent years with advances in computational infrastructure and by employing data-driven or data-assimilative approaches. Designed primarily for modeling the partially ionized space plasma using adaptive mesh refinement technique on Cartesian or spherical grids, the Multi-scale Fluid-kinetic Simulation Suite (MS-FLUKSS) is arguably one of the most sophisticated numerical codes for simulating the solar wind flow. To inform potential users and interested members of the space weather community, we present a brief summary of the current state of the solar wind models developed in the MS-FLUKSS framework, with an emphasis on the 3D heliospheric MHD models driven and constrained by remote/in situ observations and empirical coronal models such as the Wang-Sheeley-Arge model. We also discuss potential scientific and operational applications of our solar wind models on prediction of space weather (e.g., high speed streams, coronal mass ejections, and interplanetary shocks) throughout the solar system.
  2. Transient growth and resolvent analyses are routinely used to assess nonasymptotic properties of fluid flows. In particular, resolvent analysis can be interpreted as a special case of viewing flow dynamics as an open system in which free-stream turbulence, surface roughness, and other irregularities provide sources of input forcing. We offer a comprehensive summary of the tools that can be employed to probe the dynamics of fluctuations around a laminar or turbulent base flow in the presence of such stochastic or deterministic input forcing and describe how input–output techniques enhance resolvent analysis. Specifically, physical insights that may remain hidden in the resolvent analysis are gained by detailed examination of input–output responses between spatially localized body forces and selected linear combinations of state variables. This differentiating feature plays a key role in quantifying the importance of different mechanisms for bypass transition in wall-bounded shear flows and in explaining how turbulent jets generate noise. We highlight the utility of a stochastic framework, with white or colored inputs, in addressing a variety of open challenges including transition in complex fluids, flow control, and physics-aware data-driven turbulence modeling. Applications with temporally or spatially periodic base flows are discussed and future research directions are outlined.
  3. Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, there is increasing interest in reducing model errors, particularly by leveraging the rapidly growing observational data to understand their physics and sources. Here, we introduce a framework named MEDIDA: Model Error Discovery with Interpretability and Data Assimilation. MEDIDA only requires a working numerical solver of the model and a small number of noise-free or noisy sporadic observations of the system. In MEDIDA, first, the model error is estimated from differences between the observed states and model-predicted states (the latter are obtained from a number of one-time-step numerical integrations from the previous observed states). If observations are noisy, a data assimilation technique, such as the ensemble Kalman filter, is employed to provide the analysis state of the system, which is then used to estimate the model error. Finally, an equation-discovery technique, here the relevance vector machine, a sparsity-promoting Bayesian method, is used tomore »identify an interpretable, parsimonious, and closed-form representation of the model error. Using the chaotic Kuramoto–Sivashinsky system as the test case, we demonstrate the excellent performance of MEDIDA in discovering different types of structural/parametric model errors, representing different types of missing physics, using noise-free and noisy observations.

    « less
  4. Abstract Motivation Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) provides new opportunities to dissect epigenomic heterogeneity and elucidate transcriptional regulatory mechanisms. However, computational modeling of scATAC-seq data is challenging due to its high dimension, extreme sparsity, complex dependencies and high sensitivity to confounding factors from various sources. Results Here, we propose a new deep generative model framework, named SAILER, for analyzing scATAC-seq data. SAILER aims to learn a low-dimensional nonlinear latent representation of each cell that defines its intrinsic chromatin state, invariant to extrinsic confounding factors like read depth and batch effects. SAILER adopts the conventional encoder-decoder framework to learn the latent representation but imposes additional constraints to ensure the independence of the learned representations from the confounding factors. Experimental results on both simulated and real scATAC-seq datasets demonstrate that SAILER learns better and biologically more meaningful representations of cells than other methods. Its noise-free cell embeddings bring in significant benefits in downstream analyses: clustering and imputation based on SAILER result in 6.9% and 18.5% improvements over existing methods, respectively. Moreover, because no matrix factorization is involved, SAILER can easily scale to process millions of cells. We implemented SAILER into a software package, freely available to all for large-scale scATAC-seqmore »data analysis. Availability and implementation The software is publicly available at Supplementary information Supplementary data are available at Bioinformatics online.« less
  5. null (Ed.)
    Spectrum cartography (SC) aims at estimating the multi-aspect (e.g., space, frequency, and time) interference level caused by multiple emitters from limited measurements. Early SC approaches rely on model assumptions about the radio map, e.g., sparsity and smoothness, which may be grossly violated under critical scenarios, e.g., in the presence of severe shadowing. More recent data-driven methods train deep generative networks to distill parsimonious representations of complex scenarios, in order to enhance performance of SC. The challenge is that the state space of this learning problem is extremely large—induced by different combinations of key problem constituents, e.g., the number of emitters, the emitters’ carrier frequencies, and the emitter locations. Learning over such a huge space can be costly in terms of sample complexity and training time; it also frequently leads to generalization problems. Our method integrates the favorable traits of model and data-driven approaches, which substantially ‘shrinks’ the state space. Specifically, the proposed learning paradigm only needs to learn a generative model for the radio map of a single emitter (as opposed to numerous combinations of multiple emitters), leveraging a nonnegative matrix factorization (NMF)-based emitter disaggregation process. Numerical evidence shows that the proposed method outperforms state-of-the-art purely model-driven and purely data-drivenmore »approaches« less