skip to main content


Title: Conditional Gaussian nonlinear system: A fast preconditioner and a cheap surrogate model for complex nonlinear systems
Developing suitable approximate models for analyzing and simulating complex nonlinear systems is practically important. This paper aims at exploring the skill of a rich class of nonlinear stochastic models, known as the conditional Gaussian nonlinear system (CGNS), as both a cheap surrogate model and a fast preconditioner for facilitating many computationally challenging tasks. The CGNS preserves the underlying physics to a large extent and can reproduce intermittency, extreme events, and other non-Gaussian features in many complex systems arising from practical applications. Three interrelated topics are studied. First, the closed analytic formulas of solving the conditional statistics provide an efficient and accurate data assimilation scheme. It is shown that the data assimilation skill of a suitable CGNS approximate forecast model outweighs that by applying an ensemble method even to the perfect model with strong nonlinearity, where the latter suffers from filter divergence. Second, the CGNS allows the development of a fast algorithm for simultaneously estimating the parameters and the unobserved variables with uncertainty quantification in the presence of only partial observations. Utilizing an appropriate CGNS as a preconditioner significantly reduces the computational cost in accurately estimating the parameters in the original complex system. Finally, the CGNS advances rapid and statistically accurate algorithms for computing the probability density function and sampling the trajectories of the unobserved state variables. These fast algorithms facilitate the development of an efficient and accurate data-driven method for predicting the linear response of the original system with respect to parameter perturbations based on a suitable CGNS preconditioner.  more » « less
Award ID(s):
2108856
PAR ID:
10329149
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Chaos
Volume:
32
ISSN:
1089-7682
Page Range / eLocation ID:
053122
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Discovering the underlying dynamics of complex systems from data is an important practical topic. Constrained optimization algorithms are widely utilized and lead to many successes. Yet, such purely data-driven methods may bring about incorrect physics in the presence of random noise and cannot easily handle the situation with incomplete data. In this paper, a new iterative learning algorithm for complex turbulent systems with partial observations is developed that alternates between identifying model structures, recovering unobserved variables, and estimating parameters. First, a causality-based learning approach is utilized for the sparse identification of model structures, which takes into account certain physics knowledge that is pre-learned from data. It has unique advantages in coping with indirect coupling between features and is robust to stochastic noise. A practical algorithm is designed to facilitate causal inference for high-dimensional systems. Next, a systematic nonlinear stochastic parameterization is built to characterize the time evolution of the unobserved variables. Closed analytic formula via efficient nonlinear data assimilation is exploited to sample the trajectories of the unobserved variables, which are then treated as synthetic observations to advance a rapid parameter estimation. Furthermore, the localization of the state variable dependence and the physics constraints are incorporated into the learning procedure. This mitigates the curse of dimensionality and prevents the finite time blow-up issue. Numerical experiments show that the new algorithm identifies the model structure and provides suitable stochastic parameterizations for many complex nonlinear systems with chaotic dynamics, spatiotemporal multiscale structures, intermittency, and extreme events. 
    more » « less
  2. We present a non‐Gaussian ensemble data assimilation method based on the maximum‐likelihood ensemble filter, which allows for any combination of Gaussian, lognormal, and reverse lognormal errors in both the background and the observations. The technique is fully nonlinear, does not require a tangent linear model, and uses a Hessian preconditioner to minimise the cost function efficiently in ensemble space. When the Gaussian assumption is relaxed, the results show significant improvements in the analysis skill within two atmospheric toy models, and the performance of data assimilation systems for (semi)bounded variables is expected to improve. 
    more » « less
  3. Abstract

    A hybrid data assimilation algorithm is developed for complex dynamical systems with partial observations. The method starts with applying a spectral decomposition to the entire spatiotemporal fields, followed by creating a machine learning model that builds a nonlinear map between the coefficients of observed and unobserved state variables for each spectral mode. A cheap low‐order nonlinear stochastic parameterized extended Kalman filter (SPEKF) model is employed as the forecast model in the ensemble Kalman filter to deal with each mode associated with the observed variables. The resulting ensemble members are then fed into the machine learning model to create an ensemble of the corresponding unobserved variables. In addition to the ensemble spread, the training residual in the machine learning‐induced nonlinear map is further incorporated into the state estimation, advancing the diagnostic quantification of the posterior uncertainty. The hybrid data assimilation algorithm is applied to a precipitating quasi‐geostrophic (PQG) model, which includes the effects of water vapor, clouds, and rainfall beyond the classical two‐level QG model. The complicated nonlinearities in the PQG equations prevent traditional methods from building simple and accurate reduced‐order forecast models. In contrast, the SPEKF forecast model is skillful in recovering the intermittent observed states, and the machine learning model effectively estimates the chaotic unobserved signals. Utilizing the calibrated SPEKF and machine learning models under a moderate cloud fraction, the resulting hybrid data assimilation remains reasonably accurate when applied to other geophysical scenarios with nearly clear skies or relatively heavy rainfall, implying the robustness of the algorithm for extrapolation.

     
    more » « less
  4. null (Ed.)
    Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data, is a vital task in many fields. We propose a fast and accurate method, manifold-constrained Gaussian process inference (MAGI), for this task. MAGI uses a Gaussian process model over time series data, explicitly conditioned on the manifold constraint that derivatives of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. MAGI is also suitable for inference with unobserved system components, which often occur in real experiments. MAGI is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of MAGI using realistic examples based on physical experiments. 
    more » « less
  5. Abstract

    Traditional ensemble Kalman filter data assimilation methods make implicit assumptions of Gaussianity and linearity that are strongly violated by many important Earth system applications. For instance, bounded quantities like the amount of a tracer and sea ice fractional coverage cannot be accurately represented by a Gaussian that is unbounded by definition. Nonlinear relations between observations and model state variables abound. Examples include the relation between a remotely sensed radiance and the column of atmospheric temperatures, or the relation between cloud amount and water vapor quantity. Part I of this paper described a very general data assimilation framework for computing observation increments for non-Gaussian prior distributions and likelihoods. These methods can respect bounds and other non-Gaussian aspects of observed variables. However, these benefits can be lost when observation increments are used to update state variables using the linear regression that is part of standard ensemble Kalman filter algorithms. Here, regression of observation increments is performed in a space where variables are transformed by the probit and probability integral transforms, a specific type of Gaussian anamorphosis. This method can enforce appropriate bounds for all quantities and deal much more effectively with nonlinear relations between observations and state variables. Important enhancements like localization and inflation can be performed in the transformed space. Results are provided for idealized bivariate distributions and for cycling assimilation in a low-order dynamical system. Implications for improved data assimilation across Earth system applications are discussed.

     
    more » « less