skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A surrogate-based approach to nonlinear, non-Gaussian joint state-parameter data assimilation
Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.  more » « less
Award ID(s):
1821338
PAR ID:
10289563
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Foundations of Data Science
Volume:
0
Issue:
0
ISSN:
2639-8001
Page Range / eLocation ID:
0
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT We present a coherent, re-usable python framework building on the CosmoPower emulator code for high-accuracy calculations of cosmological observables with Einstein–Boltzmann codes. For detailed statistical analyses, such codes require high computing power, making parameter space exploration costly, especially for beyond-$$\Lambda$$CDM analyses. Machine learning-enabled emulators of Einstein–Boltzmann codes are becoming an increasingly popular solution to this problem. To enable generation, sharing, and use of emulators for inference, we define standards for robustly describing, packaging, and distributing them. We present software for easily performing these tasks in an automated and replicable manner and provide examples and guidelines for generating emulators and wrappers for using them in popular cosmological inference codes. We demonstrate our framework with a suite of high-accuracy emulators for the CAMB code’s calculations of CMB $$C_\ell$$, $P(k)$, background evolution, and derived parameter quantities. We show these emulators are accurate enough for analysing both $$\Lambda$$CDM and a set of extension models ($$N_{\rm eff}$$, $$\sum m_\nu$$, $$w_0 w_a$$) with stage-IV observatories, recovering the original high-accuracy spectra to tolerances well within the cosmic variance uncertainties. We show our emulators also recover cosmological parameters in a simulated cosmic-variance limited experiment, finding results well within $$0.1 \sigma$$ of the input cosmology, while requiring $$\lesssim 1/50$$ of the evaluation time. 
    more » « less
  2. null (Ed.)
    Statistical emulators are a key tool for rapidly producing probabilistic hazard analysis of geophysical processes. Given output data computed for a relatively small number of parameter inputs, an emulator interpolates the data, providing the expected value of the output at untried inputs and an estimate of error at that point. In this work, we propose to fit Gaussian Process emulators to the output from a volcanic ash transport model, Ash3d. Our goal is to predict the simulated volcanic ash thickness from Ash3d at a location of interest using the emulator. Our approach is motivated by two challenges to fitting emulators—characterizing the input wind field and interactions between that wind field and variable grain sizes. We resolve these challenges by using physical knowledge on tephra dispersal. We propose new physically motivated variables as inputs and use normalized output as the response for fitting the emulator. Subsetting based on the initial conditions is also critical in our emulator construction. Simulation studies characterize the accuracy and efficiency of our emulator construction and also reveal its current limitations. Our work represents the first emulator construction for volcanic ash transport models with considerations of the simulated physical process. 
    more » « less
  3. A statistical emulator can be used as a surrogate of complex physics-based calculations to drastically reduce the computational cost. Its successful implementation hinges on an accurate representation of the nonlinear response surface with a high-dimensional input space. Conventional “space-filling” designs, including random sampling and Latin hypercube sampling, become inefficient as the dimensionality of the input variables increases, and the predictive accuracy of the emulator can degrade substantially for a test input distant from the training input set. To address this fundamental challenge, we develop a reliable emulator for predicting complex functionals by active learning with error control (ALEC). The algorithm is applicable to infinite-dimensional mapping with high-fidelity predictions and a controlled predictive error. The computational efficiency has been demonstrated by emulating the classical density functional theory (cDFT) calculations, a statistical-mechanical method widely used in modeling the equilibrium properties of complex molecular systems. We show that ALEC is much more accurate than conventional emulators based on the Gaussian processes with “space-filling” designs and alternative active learning methods. In addition, it is computationally more efficient than direct cDFT calculations. ALEC can be a reliable building block for emulating expensive functionals owing to its minimal computational cost, controllable predictive error, and fully automatic features. 
    more » « less
  4. Abstract Bayesian particle filters (PFs) are a viable alternative to sampling methods such as Markov chain Monte Carlo methods to estimate model parameters and related uncertainties when the forward model is a dynamical system, and the data are time series that depend on the state vector. PF techniques are particularly attractive when the dimensionality of the state space is large and the numerical solution of the dynamical system over the time interval corresponding to the data is time consuming. Moreover, information contained in the PF solution can be used to infer on the sensitivity of the unknown parameters to different temporal segments of the data. This, in turn, can guide the design of more efficient and effective data collection procedures. In this article the PF method is applied to the problem of estimating cell membrane permeability to gases from pH measurements on or near the cell membrane. The forward model in this case comprises a spatially distributed system of coupled reaction–diffusion differential equations. The high dimensionality of the state space and the need to account for the micro-environment created by the pH electrode measurement device are additional challenges that are addressed by the solution method. 
    more » « less
  5. Abstract Iterative ensemble filters and smoothers are now commonly used for geophysical models. Some of these methods rely on a factorization of the observation likelihood function to sample from a posterior density through a set of “tempered” transitions to ensemble members. For Gaussian‐based data assimilation methods, tangent linear versions of nonlinear operators can be relinearized between iterations, thus leading to a solution that is less biased than a single‐step approach. This study adopts similar iterative strategies for a localized particle filter (PF) that relies on the estimation of moments to adjust unobserved variables based on importance weights. This approach builds off a “regularization” of the local PF, which forces weights to be more uniform through heuristic means. The regularization then leads to an adaptive tempering, which can also be combined with filter updates from parametric methods, such as ensemble Kalman filters. The role of iterations is analyzed by deriving the localized posterior probability density assumed by current local PF formulations and then examining how single‐step and tempered PFs sample from this density. From experiments performed with a low‐dimensional nonlinear system, the iterative and hybrid strategies show the largest benefits in observation‐sparse regimes, where only a few particles contain high likelihoods and prior errors are non‐Gaussian. This regime mimics specific applications in numerical weather prediction, where small ensemble sizes, unresolved model error, and highly nonlinear dynamics lead to prior uncertainty that is larger than measurement uncertainty. 
    more » « less