skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Estimation of the covariance structure of heavy-tailed distributions
We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write, "data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal" and "what are the other possible strategies to deal with heavy tailed distributions warrant further studies." We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the intrinsic dimension'' associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations.  more » « less
Award ID(s):
1712956
PAR ID:
10059483
Author(s) / Creator(s):
;
Date Published:
Journal Name:
NIPS
ISSN:
1365-8875
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Functional data have received significant attention as they frequently appear in modern applications, such as functional magnetic resonance imaging (fMRI) and natural language processing. The infinite-dimensional nature of functional data makes it necessary to use dimension reduction techniques. Most existing techniques, however, rely on the covariance operator, which can be affected by heavy-tailed data and unusual observations. Therefore, in this paper, we consider a robust sliced inverse regression for multivariate elliptical functional data. For that reason, we introduce a new statistical linear operator, called the conditional spatial sign Kendall’s tau covariance operator, which can be seen as an extension of the multivariate Kendall’s tau to both the conditional and functional settings. The new operator is robust to heavy-tailed data and outliers, and hence can provide a robust estimate of the sufficient predictors. We also derive the convergence rates of the proposed estimators for both completely and partially observed data. Finally, we demonstrate the finite sample performance of our estimator using simulation examples and a real dataset based on fMRI. 
    more » « less
  2. We offer a survey of recent results on covariance estimation for heavy- tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce element-wise and spectrum-wise truncation operators, as well as their M-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key observation is that the estimators needs to adapt to the sample size, dimensional- ity of the data and the noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate their practical use, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods. 
    more » « less
  3. We prove Fuk-Nagaev and Rosenthal-type inequalities for the sums of indepen- dent random matrices, focusing on the situation when the norms of the matrices possess finite moments of only low orders. Our bounds depend on the “intrinsic” dimensional char- acteristics such as the effective rank, as opposed to the dimension of the ambient space. We illustrate the advantages of such results in several applications, including new moment inequalities for the sample covariance operators of heavy-tailed distributions. Moreover, we demonstrate that our techniques yield sharpened versions of the moment inequalities for empirical processes. 
    more » « less
  4. This paper is devoted to the statistical properties of the geometric median, a robust measure of centrality for multivariate data, as well as its applications to the problem of mean estimation via the median of means principle. Our main theoretical results include (a) the upper bound for the distance between the mean and the median for general absolutely continuous distributions in $$\mathbb R^d$$, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension $$d$$; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce the improved bounds for the multivariate median of means estimator that hold for large classes of heavy-tailed distributions. 
    more » « less
  5. We study high-dimensional signal recovery from non-linear measurements with design vectors having elliptically symmetric distribution. Special attention is devoted to the situation when the unknown signal belongs to a set of low statistical complexity, while both the measurements and the design vectors are heavy-tailed. We propose and analyze a new estimator that adapts to the structure of the problem, while being robust both to the possible model misspecification characterized by arbitrary non-linearity of the measurements as well as to data corruption modeled by the heavy-tailed distributions. Moreover, this estimator has low computational complexity. Our results are expressed in the form of exponential concentration inequalities for the error of the proposed estimator. On the technical side, our proofs rely on the generic chaining methods, and illustrate the power of this approach for statistical applications. Theory is supported by numerical experiments demonstrating that our estimator outperforms existing alternatives when data is heavy-tailed. 
    more » « less