We synthesize knowledge from numerical weather prediction, inverse theory, and statistics to address the problem of estimating a high‐dimensional covariance matrix from a small number of samples. This problem is fundamental in statistics, machine learning/artificial intelligence, and in modern Earth science. We create several new adaptive methods for high‐dimensional covariance estimation, but one method, which we call Noise‐Informed Covariance Estimation (NICE), stands out because it has three important properties: (a) NICE is conceptually simple and computationally efficient; (b) NICE guarantees symmetric positive semi‐definite covariance estimates; and (c) NICE is largely tuning‐free. We illustrate the use of NICE on a large set of Earth science–inspired numerical examples, including cycling data assimilation, inversion of geophysical field data, and training of feed‐forward neural networks with time‐averaged data from a chaotic dynamical system. Our theory, heuristics and numerical tests suggest that NICE may indeed be a viable option for high‐dimensional covariance estimation in many Earth science problems.
To construct an optimal estimating function by weighting a set of score functions, we must either know or estimate consistently the covariance matrix for the individual scores. In problems with high dimensional correlated data the estimated covariance matrix could be unreliable. The smallest eigenvalues of the covariance matrix will be the most important for weighting the estimating equations, but in high dimensions these will be poorly determined. Generalized estimating equations introduced the idea of a working correlation to minimize such problems. However, it can be difficult to specify the working correlation model correctly. We develop an adaptive estimating equation method which requires no working correlation assumptions. This methodology relies on finding a reliable approximation to the inverse of the variance matrix in the quasi-likelihood equations. We apply a multivariate generalization of the conjugate gradient method to find estimating equations that preserve the information well at fixed low dimensions. This approach is particularly useful when the estimator of the covariance matrix is singular or close to singular, or impossible to invert owing to its large size.
more » « less- PAR ID:
- 10405790
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Volume:
- 65
- Issue:
- 1
- ISSN:
- 1369-7412
- Format(s):
- Medium: X Size: p. 127-142
- Size(s):
- p. 127-142
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
In statistics and machine learning, we are interested in the eigenvectors (or singular vectors) of certain matrices (e.g.\ covariance matrices, data matrices, etc). However, those matrices are usually perturbed by noises or statistical errors, either from random sampling or structural patterns. The Davis-Kahan $\sin \theta$ theorem is often used to bound the difference between the eigenvectors of a matrix $A$ and those of a perturbed matrix $\widetilde{A} = A + E$, in terms of $\ell_2$ norm. In this paper, we prove that when $A$ is a low-rank and incoherent matrix, the $\ell_{\infty}$ norm perturbation bound of singular vectors (or eigenvectors in the symmetric case) is smaller by a factor of $\sqrt{d_1}$ or $\sqrt{d_2}$ for left and right vectors, where $d_1$ and $d_2$ are the matrix dimensions. The power of this new perturbation result is shown in robust covariance estimation, particularly when random variables have heavy tails. There, we propose new robust covariance estimators and establish their asymptotic properties using the newly developed perturbation bound. Our theoretical results are verified through extensive numerical experiments.more » « less
-
We study the problem of estimating a large, low-rank matrix corrupted by additive noise of unknown covariance, assuming one has access to additional side information in the form of noise-only measurements. We study the Whiten-Shrink-reColour (WSC) workflow, where a ‘noise covariance whitening’ transformation is applied to the observations, followed by appropriate singular value shrinkage and a ‘noise covariance re-colouring’ transformation. We show that under the mean square error loss, a unique, asymptotically optimal shrinkage nonlinearity exists for the WSC denoising workflow, and calculate it in closed form. To this end, we calculate the asymptotic eigenvector rotation of the random spiked F-matrix ensemble, a result which may be of independent interest. With sufficiently many pure-noise measurements, our optimally tuned WSC denoising workflow outperforms, in mean square error, matrix denoising algorithms based on optimal singular value shrinkage that do not make similar use of noise-only side information; numerical experiments show that our procedure’s relative performance is particularly strong in challenging statistical settings with high dimensionality and large degree of heteroscedasticity.more » « less
-
Summary We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses.
-
Abstract In this paper, we address the problem of estimating transport surplus (a.k.a. matching affinity) in high-dimensional optimal transport problems. Classical optimal transport theory specifies the matching affinity and determines the optimal joint distribution. In contrast, we study the inverse problem of estimating matching affinity based on the observation of the joint distribution, using an entropic regularization of the problem. To accommodate high dimensionality of the data, we propose a novel method that incorporates a nuclear norm regularization that effectively enforces a rank constraint on the affinity matrix. The low-rank matrix estimated in this way reveals the main factors that are relevant for matching.