skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Foundations for Universal Non‐Gaussian Data Assimilation
Abstract In many applications of data assimilation, especially when the size of the problem is large, a substantial assumption is made: all variables are well‐described by Gaussian error statistics. This assumption has the advantage of making calculations considerably simpler, but it is often not valid, leading to biases in forecasts or, even worse, unphysical predictions. We propose a simple, but effective, way of replacing this assumption, by making use of transforming functions, while remaining consistent with Bayes' theorem. This method allows the errors to have any value of the skewness and kurtosis, and permits physical bounds for the variables. As such, the error distribution can conform better to the underlying statistics, reducing biases introduced by the Gaussian assumption. We apply this framework to a 3D variational data assimilation method, and find improved performance in a simple atmospheric toy model (Lorenz‐63), compared to an all‐Gaussian technique.  more » « less
Award ID(s):
2033405
PAR ID:
10479350
Author(s) / Creator(s):
 ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Geophysical Research Letters
Volume:
50
Issue:
23
ISSN:
0094-8276
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many data assimilation methods require knowledge of the first two moments of the background and observation errors to function optimally. To ensure the effective performance of such methods, it is often advantageous to estimate the second moment of the observation errors directly. We examine three different strategies for doing so, focusing specifically on the case of a single scalar observation error variance parameterr. The first method is the well-known Desroziers et al. “diagnostic check” iteration (DBCP). The second method, described in Karspeck, adapts the “spread–error” diagnostic—used for assessing ensemble reliability—to observations and generates a point estimate ofrby taking the expectation of various observation-space statistics and using an ensemble to model background error statistics explicitly. The third method is an approximate Bayesian scheme that uses an inverse-gamma prior and a modified Gaussian likelihood. All three methods can recover the correct observation error variance when both the background and observation errors are Gaussian and the background error variance is well specified. We also demonstrate that it is often possible to estimatereven when the observation error is not Gaussian or when the forward operator mapping model states into observation space is nonlinear. The DBCP method is found to be most robust to these complications; however, the other two methods perform similarly well in most cases and have the added benefit that they can be used to estimaterbefore data assimilation. We conclude that further investigation is warranted into the latter two methods, specifically into how they perform when extended to the multivariate case. Significance StatementObservations of the Earth system (e.g., from satellites, radiosondes, aircraft, etc.,) each have some associated uncertainty. To use observations to improve model forecasts, it is important to understand the size of that uncertainty. This study compares three statistical methods for estimating observation errors, all of which can be continuously implemented whenever new observations are used to correct a model. Our results suggest that all three methods can improve forecast outcomes, but that, if observations are believed to have highly biased or skewed errors, care should be taken in choosing which to use and interpreting its results. Future studies should investigate robust methods for estimating more complicated types of errors. 
    more » « less
  2. Abstract Obtaining a faithful probabilistic depiction of moist convection is complicated by unknown errors in subgrid-scale physical parameterization schemes, invalid assumptions made by data assimilation (DA) techniques, and high system dimensionality. As an initial step toward untangling sources of uncertainty in convective weather regimes, we evaluate a novel Bayesian data assimilation methodology based on particle filtering within a WRF ensemble analysis and forecasting system. Unlike most geophysical DA methods, the particle filter (PF) represents prior and posterior error distributions nonparametrically rather than assuming a Gaussian distribution and can accept any type of likelihood function. This approach is known to reduce bias introduced by Gaussian approximations in low-dimensional and idealized contexts. The form of PF used in this research adopts a dimension-reduction strategy, making it affordable for typical weather applications. The present study examines posterior ensemble members and forecasts for select severe weather events between 2019 and 2020, comparing results from the PF with those from an ensemble Kalman filter (EnKF). We find that assimilating with a PF produces posterior quantities for microphysical variables that are more consistent with model climatology than comparable quantities from an EnKF, which we attribute to a reduction in DA bias. These differences are significant enough to impact the dynamic evolution of convective systems via cold pool strength and propagation, with impacts to forecast verification scores depending on the particular microphysics scheme. Our findings have broad implications for future approaches to the selection of physical parameterization schemes and parameter estimation within preexisting data assimilation frameworks. Significance StatementThe accurate prediction of severe storms using numerical weather models depends on effective parameterization schemes for small-scale processes and the assimilation of incomplete observational data in a manner that faithfully represents the probabilistic state of the atmosphere. Current generation methods for data assimilation typically assume a standard form for the error distributions of relevant quantities, which can introduce bias that not only hinders numerical prediction, but that can also confound the characterization of errors from the model itself. The current study performs data assimilation using a novel method that does not make such assumptions and explores characteristics of resulting model fields and forecasts that might make such a method useful for improving model parameterization schemes. 
    more » « less
  3. We present a non‐Gaussian ensemble data assimilation method based on the maximum‐likelihood ensemble filter, which allows for any combination of Gaussian, lognormal, and reverse lognormal errors in both the background and the observations. The technique is fully nonlinear, does not require a tangent linear model, and uses a Hessian preconditioner to minimise the cost function efficiently in ensemble space. When the Gaussian assumption is relaxed, the results show significant improvements in the analysis skill within two atmospheric toy models, and the performance of data assimilation systems for (semi)bounded variables is expected to improve. 
    more » « less
  4. Abstract Imputation is a popular technique for handling item nonresponse. Parametric imputation is based on a parametric model for imputation and is not robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose a new semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the proposed mixture model, we assume a conditional Gaussian model for the study variable given the auxiliary variables, but the marginal distribution of the auxiliary variables is not necessarily Gaussian. The proposed mixture model is more flexible and achieves a better approximation than the Gaussian mixture models. The proposed method is applicable to high‐dimensional covariate problem by including a penalty function in the conditional log‐likelihood function. The proposed method is applied to the 2017 Korean Household Income and Expenditure Survey conducted by Statistics Korea. 
    more » « less
  5. Abstract For data assimilation to provide faithful state estimates for dynamical models, specifications of observation uncertainty need to be as accurate as possible. Innovation-based methods based on Desroziers diagnostics, are commonly used to estimate observation uncertainty, but such methods can depend greatly on the prescribed background uncertainty. For ensemble data assimilation, this uncertainty comes from statistics calculated from ensemble forecasts, which require inflation and localization to address under sampling. In this work, we use an ensemble Kalman filter (EnKF) with a low-dimensional Lorenz model to investigate the interplay between the Desroziers method and inflation. Two inflation techniques are used for this purpose: 1) a rigorously tuned fixed multiplicative scheme and 2) an adaptive state-space scheme. We document how inaccuracies in observation uncertainty affect errors in EnKF posteriors and study the combined impacts of misspecified initial observation uncertainty, sampling error, and model error on Desroziers estimates. We find that whether observation uncertainty is over- or underestimated greatly affects the stability of data assimilation and the accuracy of Desroziers estimates and that preference should be given to initial overestimates. Inline estimates of Desroziers tend to remove the dependence between ensemble spread–skill and the initially prescribed observation error. In addition, we find that the inclusion of model error introduces spurious correlations in observation uncertainty estimates. Further, we note that the adaptive inflation scheme is less robust than fixed inflation at mitigating multiple sources of error. Last, sampling error strongly exacerbates existing sources of error and greatly degrades EnKF estimates, which translates into biased Desroziers estimates of observation error covariance. Significance StatementTo generate accurate predictions of various components of the Earth system, numerical models require an accurate specification of state variables at our current time. This step adopts a probabilistic consideration of our current state estimate versus information provided from environmental measurements of the true state. Various strategies exist for estimating uncertainty in observations within this framework, but are sensitive to a host of assumptions, which are investigated in this study. 
    more » « less