Robust Modifications of U-statistics and Applications to Covariance Estimation Problems
Let Y be a d-dimensional random vector with unknown mean μ and covariance matrix Σ. This paper is motivated by the problem of designing an estimator of Σ that admits exponential deviation bounds in the operator norm under minimal assumptions on the underlying distribution, such as existence of only 4th moments of the coordinates of Y. To address this problem, we propose robust modifications of the operator-valued U-statistics, obtain non-asymptotic guarantees for their performance, and demonstrate the implications of these results to the covariance estimation problem under various structural assumptions.
Authors:
;
Award ID(s):
Publication Date:
NSF-PAR ID:
10096967
Journal Name:
Preprint
ISSN:
1651-1794
1. Abstract: We consider the problem of estimating the covariance structure of a random vector $Y\in \mathbb R^d$ from a sample $Y_1,\ldots,Y_n$. We are interested in the situation when d is large compared to n but the covariance matrix $\Sigma$ of interest has (exactly or approximately) low rank. We assume that the given sample is (a) $\epsilon$-adversarially corrupted, meaning that $\epsilon$ fraction of the observations could have been replaced by arbitrary vectors, or that (b) the sample is i.i.d. but the underlying distribution is heavy-tailed, meaning that the norm of Y possesses only 4 finite moments. We propose an estimator that is adaptive to the potential low-rank structure of the covariance matrix as well as to the proportion of contaminated data, and admits tight deviation guarantees despite rather weak assumptions on the underlying distribution. Finally, we discuss the algorithms that allow to approximate the proposed estimator in a numerically efficient way.