skip to main content

Title: A Statistical Hypothesis Testing Strategy for Adaptively Blending Particle Filters and Ensemble Kalman Filters for Data Assimilation
Abstract Particle filters avoid parametric estimates for Bayesian posterior densities, which alleviates Gaussian assumptions in nonlinear regimes. These methods, however, are more sensitive to sampling errors than Gaussian-based techniques such as ensemble Kalman filters. A recent study by the authors introduced an iterative strategy for particle filters that match posterior moments—where iterations improve the filter’s ability to draw samples from non-Gaussian posterior densities. The iterations follow from a factorization of particle weights, providing a natural framework for combining particle filters with alternative filters to mitigate the impact of sampling errors. The current study introduces a novel approach to forming an adaptive hybrid data assimilation methodology, exploiting the theoretical strengths of nonparametric and parametric filters. At each data assimilation cycle, the iterative particle filter performs a sequence of updates while the prior sample distribution is non-Gaussian, then an ensemble Kalman filter provides the final adjustment when Gaussian distributions for marginal quantities are detected. The method employs the Shapiro–Wilk test to determine when to make the transition between filter algorithms, which has outstanding power for detecting departures from normality. Experiments using low-dimensional models demonstrate that the approach has a significant value, especially for nonhomogeneous observation networks and unknown model process errors. Moreover, hybrid factors are extended to consider marginals of more than one collocated variables using a test for multivariate normality. Findings from this study motivate the use of the proposed method for geophysical problems characterized by diverse observation networks and various dynamic instabilities, such as numerical weather prediction models. Significance Statement Data assimilation statistically processes observation errors and model forecast errors to provide optimal initial conditions for the forecast, playing a critical role in numerical weather forecasting. The ensemble Kalman filter, which has been widely adopted and developed in many operational centers, assumes Gaussianity of the prior distribution and solves a linear system of equations, leading to bias in strong nonlinear regimes. On the other hand, particle filters avoid many of those assumptions but are sensitive to sampling errors and are computationally expensive. We propose an adaptive hybrid strategy that combines their advantages and minimizes the disadvantages of the two methods. The hybrid particle filter–ensemble Kalman filter is achieved with the Shapiro–Wilk test to detect the Gaussianity of the ensemble members and determine the timing of the transition between these filter updates. Demonstrations in this study show that the proposed method is advantageous when observations are heterogeneous and when the model has an unknown bias. Furthermore, by extending the statistical hypothesis test to the test for multivariate normality, we consider marginals of more than one collocated variable. These results encourage further testing for real geophysical problems characterized by various dynamic instabilities, such as real numerical weather prediction models.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Monthly Weather Review
Page Range / eLocation ID:
105 to 125
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hoteit, Ibrahim (Ed.)
    A hybrid particle ensemble Kalman filter is developed for problems with medium non-Gaussianity, i.e. problems where the prior is very non-Gaussian but the posterior is approximately Gaussian. Such situations arise, e.g., when nonlinear dynamics produce a non-Gaussian forecast but a tight Gaussian likelihood leads to a nearly-Gaussian posterior. The hybrid filter starts by factoring the likelihood. First the particle filter assimilates the observations with one factor of the likelihood to produce an intermediate prior that is close to Gaussian, and then the ensemble Kalman filter completes the assimilation with the remaining factor. How the likelihood gets split between the two stages is determined in such a way to ensure that the particle filter avoids collapse, and particle degeneracy is broken by a mean-preserving random orthogonal transformation. The hybrid is tested in a simple two-dimensional (2D) problem and a multiscale system of ODEs motivated by the Lorenz-‘96 model. In the 2D problem it outperforms both a pure particle filter and a pure ensemble Kalman filter, and in the multiscale Lorenz-‘96 model it is shown to outperform a pure ensemble Kalman filter, provided that the ensemble size is large enough. 
    more » « less
  2. Abstract

    Iterative ensemble filters and smoothers are now commonly used for geophysical models. Some of these methods rely on a factorization of the observation likelihood function to sample from a posterior density through a set of “tempered” transitions to ensemble members. For Gaussian‐based data assimilation methods, tangent linear versions of nonlinear operators can be relinearized between iterations, thus leading to a solution that is less biased than a single‐step approach. This study adopts similar iterative strategies for a localized particle filter (PF) that relies on the estimation of moments to adjust unobserved variables based on importance weights. This approach builds off a “regularization” of the local PF, which forces weights to be more uniform through heuristic means. The regularization then leads to an adaptive tempering, which can also be combined with filter updates from parametric methods, such as ensemble Kalman filters. The role of iterations is analyzed by deriving the localized posterior probability density assumed by current local PF formulations and then examining how single‐step and tempered PFs sample from this density. From experiments performed with a low‐dimensional nonlinear system, the iterative and hybrid strategies show the largest benefits in observation‐sparse regimes, where only a few particles contain high likelihoods and prior errors are non‐Gaussian. This regime mimics specific applications in numerical weather prediction, where small ensemble sizes, unresolved model error, and highly nonlinear dynamics lead to prior uncertainty that is larger than measurement uncertainty.

    more » « less
  3. Abstract Weather prediction models currently operate within a probabilistic framework for generating forecasts conditioned on recent measurements of Earth’s atmosphere. This framework can be conceptualized as one that approximates parts of a Bayesian posterior density estimated under assumptions of Gaussian errors. Gaussian error approximations are appropriate for synoptic-scale atmospheric flow, which experiences quasi-linear error evolution over time scales depicted by measurements, but are often hypothesized to be inappropriate for highly nonlinear, sparsely-observed mesoscale processes. The current study adopts an experimental regional modeling system to examine the impact of Gaussian prior error approximations, which are adopted by ensemble Kalman filters (EnKFs) to generate probabilistic predictions. The analysis is aided by results obtained using recently-introduced particle filter (PF) methodology that relies on an implicit non-parametric representation of prior probability densities—but with added computational expense. The investigation focuses on EnKF and PF comparisons over month-long experiments performed using an extensive domain, which features the development and passage of numerous extratropical and tropical cyclones. The experiments reveal spurious small-scale corrections in EnKF members, which come about from inappropriate Gaussian approximations for priors dominated by alignment uncertainty in mesoscale weather systems. Similar behavior is found in PF members, owing to the use of a localization operator, but to a much lesser extent. This result is reproduced and studied using a low-dimensional model, which permits the use of large sample estimates of the Bayesian posterior distribution. Findings from this study motivate the use of data assimilation techniques that provide a more appropriate specification of multivariate non-Gaussian prior densities or a multi-scale treatment of alignment errors during data assimilation. 
    more » « less
  4. Abstract

    Traditional ensemble Kalman filter data assimilation methods make implicit assumptions of Gaussianity and linearity that are strongly violated by many important Earth system applications. For instance, bounded quantities like the amount of a tracer and sea ice fractional coverage cannot be accurately represented by a Gaussian that is unbounded by definition. Nonlinear relations between observations and model state variables abound. Examples include the relation between a remotely sensed radiance and the column of atmospheric temperatures, or the relation between cloud amount and water vapor quantity. Part I of this paper described a very general data assimilation framework for computing observation increments for non-Gaussian prior distributions and likelihoods. These methods can respect bounds and other non-Gaussian aspects of observed variables. However, these benefits can be lost when observation increments are used to update state variables using the linear regression that is part of standard ensemble Kalman filter algorithms. Here, regression of observation increments is performed in a space where variables are transformed by the probit and probability integral transforms, a specific type of Gaussian anamorphosis. This method can enforce appropriate bounds for all quantities and deal much more effectively with nonlinear relations between observations and state variables. Important enhancements like localization and inflation can be performed in the transformed space. Results are provided for idealized bivariate distributions and for cycling assimilation in a low-order dynamical system. Implications for improved data assimilation across Earth system applications are discussed.

    more » « less
  5. Abstract

    Linear transformations are widely used in data assimilation for covariance modeling, for reducing dimensionality (such as averaging dense observations to form “superobs”), and for managing sampling error in ensemble data assimilation. Here we describe a linear transformation that is optimal in the sense that, in the transformed space, the state variables and observations have uncorrelated errors, and a diagonal gain matrix in the update step. We conjecture, and provide numerical evidence, that the transformation is the best possible to precede covariance localization in an ensemble Kalman filter. A central feature of this transformation in the update step are scalars, which we term canonical observation operators (COOs), that relate pairs of transformed observations and state variables and rank‐order those pairs by their influence in the update. We show for an idealized problem that sample‐based estimates of the COOs, in conjunction with covariance localization for the sample covariance, can approximate well the true values, but a practical implementation of the transformation for high‐dimensional applications remains a subject for future research. The COOs also completely describe important properties of the update step, such as observation‐state mutual information, signal‐to‐noise and degrees of freedom for signal, and so give new insights, including relations among reduced‐rank approximations to variational schemes, particle‐filter weight degeneracy, and the local ensemble transform Kalman filter.

    more » « less