skip to main content

Title: An Optimal Linear Transformation for Data Assimilation

Linear transformations are widely used in data assimilation for covariance modeling, for reducing dimensionality (such as averaging dense observations to form “superobs”), and for managing sampling error in ensemble data assimilation. Here we describe a linear transformation that is optimal in the sense that, in the transformed space, the state variables and observations have uncorrelated errors, and a diagonal gain matrix in the update step. We conjecture, and provide numerical evidence, that the transformation is the best possible to precede covariance localization in an ensemble Kalman filter. A central feature of this transformation in the update step are scalars, which we term canonical observation operators (COOs), that relate pairs of transformed observations and state variables and rank‐order those pairs by their influence in the update. We show for an idealized problem that sample‐based estimates of the COOs, in conjunction with covariance localization for the sample covariance, can approximate well the true values, but a practical implementation of the transformation for high‐dimensional applications remains a subject for future research. The COOs also completely describe important properties of the update step, such as observation‐state mutual information, signal‐to‐noise and degrees of freedom for signal, and so give new insights, including relations among reduced‐rank approximations to variational schemes, particle‐filter weight degeneracy, and the local ensemble transform Kalman filter.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Journal of Advances in Modeling Earth Systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Traditional ensemble Kalman filter data assimilation methods make implicit assumptions of Gaussianity and linearity that are strongly violated by many important Earth system applications. For instance, bounded quantities like the amount of a tracer and sea ice fractional coverage cannot be accurately represented by a Gaussian that is unbounded by definition. Nonlinear relations between observations and model state variables abound. Examples include the relation between a remotely sensed radiance and the column of atmospheric temperatures, or the relation between cloud amount and water vapor quantity. Part I of this paper described a very general data assimilation framework for computing observation increments for non-Gaussian prior distributions and likelihoods. These methods can respect bounds and other non-Gaussian aspects of observed variables. However, these benefits can be lost when observation increments are used to update state variables using the linear regression that is part of standard ensemble Kalman filter algorithms. Here, regression of observation increments is performed in a space where variables are transformed by the probit and probability integral transforms, a specific type of Gaussian anamorphosis. This method can enforce appropriate bounds for all quantities and deal much more effectively with nonlinear relations between observations and state variables. Important enhancements like localization and inflation can be performed in the transformed space. Results are provided for idealized bivariate distributions and for cycling assimilation in a low-order dynamical system. Implications for improved data assimilation across Earth system applications are discussed.

    more » « less
  2. Abstract

    For data assimilation to provide faithful state estimates for dynamical models, specifications of observation uncertainty need to be as accurate as possible. Innovation-based methods based on Desroziers diagnostics, are commonly used to estimate observation uncertainty, but such methods can depend greatly on the prescribed background uncertainty. For ensemble data assimilation, this uncertainty comes from statistics calculated from ensemble forecasts, which require inflation and localization to address under sampling. In this work, we use an ensemble Kalman filter (EnKF) with a low-dimensional Lorenz model to investigate the interplay between the Desroziers method and inflation. Two inflation techniques are used for this purpose: 1) a rigorously tuned fixed multiplicative scheme and 2) an adaptive state-space scheme. We document how inaccuracies in observation uncertainty affect errors in EnKF posteriors and study the combined impacts of misspecified initial observation uncertainty, sampling error, and model error on Desroziers estimates. We find that whether observation uncertainty is over- or underestimated greatly affects the stability of data assimilation and the accuracy of Desroziers estimates and that preference should be given to initial overestimates. Inline estimates of Desroziers tend to remove the dependence between ensemble spread–skill and the initially prescribed observation error. In addition, we find that the inclusion of model error introduces spurious correlations in observation uncertainty estimates. Further, we note that the adaptive inflation scheme is less robust than fixed inflation at mitigating multiple sources of error. Last, sampling error strongly exacerbates existing sources of error and greatly degrades EnKF estimates, which translates into biased Desroziers estimates of observation error covariance.

    Significance Statement

    To generate accurate predictions of various components of the Earth system, numerical models require an accurate specification of state variables at our current time. This step adopts a probabilistic consideration of our current state estimate versus information provided from environmental measurements of the true state. Various strategies exist for estimating uncertainty in observations within this framework, but are sensitive to a host of assumptions, which are investigated in this study.

    more » « less
  3. Abstract

    A regional‐scale fully coupled data assimilation (DA) system based on the ensemble Kalman filter is developed for a high‐resolution coupled atmosphere‐ocean model. Through the flow‐dependent covariance both within and across the oceanic and atmospheric domains, the fully coupled DA system is capable of updating both atmospheric and oceanic state variables simultaneously by assimilating either atmospheric and/or oceanic observations. The potential impacts of oceanic observations, including sea‐surface temperature, sea‐surface height anomaly, and sea‐surface current, in addition to the observation of the minimum surface pressure at the storm center (HPI), on tropical cyclone analysis and prediction are examined through observing system simulation experiments of Hurricane Florence (2018). Results show that assimilation of oceanic observations not only resulted in better analysis and forecast of the oceanic variables but also considerably reduced analysis and forecast errors in the atmospheric fields, including the intensity and structure of Florence. Compared to weakly coupled DA in which the analysis update is performed separately for the atmospheric and oceanic domains, fully coupled DA reduces the forecast errors of tropical cyclone track and intensity. Results show promise in potential further improvement in tropical cyclone prediction through assimilation of both atmospheric and oceanic observations using the ensemble‐based fully coupled DA system.

    more » « less
  4. Abstract

    Localization is essential to effectively assimilate satellite radiances in ensemble Kalman filters. However, the vertical location and separation from a model grid point variable for a radiance observation are not well defined, which results in complexities when localizing the impact of radiance observations. An adaptive method is proposed to estimate an effective vertical localization independently for each assimilated channel of every satellite platform. It uses sample correlations between ensemble priors of observations and state variables from a cycling data assimilation to estimate the localization function that minimizes the sampling error. The estimated localization functions are approximated by three localization parameters: the localization width, maximum value, and vertical location of the radiance observations. Adaptively estimated localization parameters are used in assimilation experiments with the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) model and the National Oceanic and Atmospheric Administration (NOAA) operational ensemble Kalman filter (EnKF). Results show that using the adaptive localization width and vertical location for radiance observations is more beneficial than also including the maximum localization value. The experiment using the adaptively estimated localization width and vertical location performs better than the default Gaspari and Cohn (GC) experiment, and produces similar errors to the optimal GC experiment. The adaptive localization parameters can be computed during the assimilation procedure, so the computational cost needed to tune the optimal GC localization width is saved.

    more » « less
  5. Abstract

    Iterative ensemble filters and smoothers are now commonly used for geophysical models. Some of these methods rely on a factorization of the observation likelihood function to sample from a posterior density through a set of “tempered” transitions to ensemble members. For Gaussian‐based data assimilation methods, tangent linear versions of nonlinear operators can be relinearized between iterations, thus leading to a solution that is less biased than a single‐step approach. This study adopts similar iterative strategies for a localized particle filter (PF) that relies on the estimation of moments to adjust unobserved variables based on importance weights. This approach builds off a “regularization” of the local PF, which forces weights to be more uniform through heuristic means. The regularization then leads to an adaptive tempering, which can also be combined with filter updates from parametric methods, such as ensemble Kalman filters. The role of iterations is analyzed by deriving the localized posterior probability density assumed by current local PF formulations and then examining how single‐step and tempered PFs sample from this density. From experiments performed with a low‐dimensional nonlinear system, the iterative and hybrid strategies show the largest benefits in observation‐sparse regimes, where only a few particles contain high likelihoods and prior errors are non‐Gaussian. This regime mimics specific applications in numerical weather prediction, where small ensemble sizes, unresolved model error, and highly nonlinear dynamics lead to prior uncertainty that is larger than measurement uncertainty.

    more » « less