Hierarchical sparse Cholesky decomposition with applications to high-dimensional spatio-temporal filtering
Abstract

Spatial statistics often involves Cholesky decomposition of covariance matrices. To ensure scalability to high dimensions, several recent approximations have assumed a sparse Cholesky factor of the precision matrix. We propose a hierarchical Vecchia approximation, whose conditional-independence assumptions imply sparsity in the Cholesky factors of both the precision and the covariance matrix. This remarkable property is crucial for applications to high-dimensional spatiotemporal filtering. We present a fast and simple algorithm to compute our hierarchical Vecchia approximation, and we provide extensions to nonlinear data assimilation with non-Gaussian data based on the Laplace approximation. In several numerical comparisons, including a filtering analysis of satellite data, our methods strongly outperformed alternative approaches.

Authors:
;
Award ID(s):
Publication Date:
NSF-PAR ID:
10363360
Journal Name:
Statistics and Computing
Volume:
32
Issue:
1
ISSN:
0960-3174
Publisher:
1. Abstract Modifications of the matter power spectrum due to baryonic physics are one of the major theoretical uncertainties in cosmological weak lensing measurements. Developing robust mitigation schemes for this source of systematic uncertainty increases the robustness of cosmological constraints, and may increase their precision if they enable the use of information from smaller scales. Here we explore the performance of two mitigation schemes for baryonic effects in weak lensing cosmic shear: the principal component analysis (PCA) method and the halo-model approach in hmcode. We construct mock tomographic shear power spectra from four hydrodynamical simulations, and run simulated likelihood analyses with cosmolike assuming LSST-like survey statistics. With an angular scale cut of ℓmax < 2000, both methods successfully remove the biases in cosmological parameters due to the various baryonic physics scenarios, with the PCA method causing less degradation in the parameter constraints than hmcode. For a more aggressive ℓmax = 5000, the PCA method performs well for all but one baryonic physics scenario, requiring additional training simulations to account for the extreme baryonic physics scenario of Illustris; hmcode exhibits tensions in the 2D posterior distributions of cosmological parameters due to lack of freedom in describing the power spectrum for $k \gt 10\more » 2. Abstract The ensemble Kalman filter (EnKF) is a popular technique for data assimilation in high-dimensional nonlinear state-space models. The EnKF represents distributions of interest by an ensemble, which is a form of dimension reduction that enables straightforward forecasting even for complicated and expensive evolution operators. However, the EnKF update step involves estimation of the forecast covariance matrix based on the (often small) ensemble, which requires regularization. Many existing regularization techniques rely on spatial localization, which may ignore long-range dependence. Instead, our proposed approach assumes a sparse Cholesky factor of the inverse covariance matrix, and the nonzero Cholesky entries are further regularized. The resulting method is highly flexible and computationally scalable. In our numerical experiments, our approach was more accurate and less sensitive to misspecification of tuning parameters than tapering-based localization. 3. We present an ensemble filtering method based on a linear model for the precision matrix (the inverse of the covariance) with the parameters determined by Score Matching Estimation. The method provides a rigorous covariance regularization when the underlying random field is Gaussian Markov. The parameters are found by solving a system of linear equations. The analysis step uses the inverse formulation of the Kalman update. Several filter versions, differing in the construction of the analysis ensemble, are proposed, as well as a Score matching version of the Extended Kalman Filter. 4. Summary This paper is concerned with empirical likelihood inference on the population mean when the dimension$p$and the sample size$n$satisfy$p/n\rightarrow c\in [1,\infty)$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when$p/n>1/2$because the convex hull of the$n$observations in$\mathbb{R}^p$becomes too small to cover the true mean value. Moreover, when$p> n\$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.