skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Structured Signal Recovery from Non-linear and Heavy-tailed Measurements
We study high-dimensional signal recovery from non-linear measurements with design vectors having elliptically symmetric distribution. Special attention is devoted to the situation when the unknown signal belongs to a set of low statistical complexity, while both the measurements and the design vectors are heavy-tailed. We propose and analyze a new estimator that adapts to the structure of the problem, while being robust both to the possible model misspecification characterized by arbitrary non-linearity of the measurements as well as to data corruption modeled by the heavy-tailed distributions. Moreover, this estimator has low computational complexity. Our results are expressed in the form of exponential concentration inequalities for the error of the proposed estimator. On the technical side, our proofs rely on the generic chaining methods, and illustrate the power of this approach for statistical applications. Theory is supported by numerical experiments demonstrating that our estimator outperforms existing alternatives when data is heavy-tailed.  more » « less
Award ID(s):
1712956
PAR ID:
10059482
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Information Theory
ISSN:
0018-9448
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The goal of compressed sensing is to estimate a high dimensional vector from an underdetermined system of noisy linear equations. In analogy to classical compressed sensing, here we assume a generative model as a prior, that is, we assume the vector is represented by a deep generative model G:Rk→Rn. Classical recovery approaches such as empirical risk minimization (ERM) are guaranteed to succeed when the measurement matrix is sub-Gaussian. However, when the measurement matrix and measurements are heavy-tailed or have outliers, recovery may fail dramatically. In this paper we propose an algorithm inspired by the Median-of-Means (MOM). Our algorithm guarantees recovery for heavy-tailed data, even in the presence of outliers. Theoretically, our results show our novel MOM-based algorithm enjoys the same sample complexity guarantees as ERM under sub-Gaussian assumptions. Our experiments validate both aspects of our claims: other algorithms are indeed fragile and fail under heavy-tailed and/or corrupted data, while our approach exhibits the predicted robustness. 
    more » « less
  2. We consider the high-dimensional linear regression model and assume that a fraction of the measurements are altered by an adversary with complete knowledge of the data and the underlying distribution. We are interested in a scenario where dense additive noise is heavy-tailed while the measurement vectors follow a sub-Gaussian distribution. Within this framework, we establish minimax lower bounds for the performance of an arbitrary estimator that depend on the the fraction of corrupted observations as well as the tail behavior of the additive noise. Moreover, we design a modification of the so-called Square-Root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, and satisfies performance guarantees in the form of sub-Gaussian deviation inequalities that match the lower error bounds, up to logarithmic factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the additive noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that may be of independent interest. 
    more » « less
  3. We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write, "data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal" and "what are the other possible strategies to deal with heavy tailed distributions warrant further studies." We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the intrinsic dimension'' associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations. 
    more » « less
  4. In this paper, we propose differentially private algorithms for robust (multivariate) mean estimation and inference under heavy-tailed distributions, with a focus on Gaussian differential privacy. First, we provide a comprehensive analysis of the Huber mean estimator with increasing dimensions, including non-asymptotic deviation bound, Bahadur representation, and (uniform) Gaussian approximations. Secondly, we privatize the Huber mean estimator via noisy gradient descent, which is proven to achieve near-optimal statistical guarantees. The key is to characterize quantitatively the trade-off between statistical accuracy, degree of robustness and privacy level, governed by a carefully chosen robustification parameter. Finally, we construct private confidence intervals for the proposed estimator by incorporating a private and robust covariance estimator. Our findings are demonstrated by simulation studies. 
    more » « less
  5. We consider the task of heavy-tailed statistical estimation given streaming p-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional O(p) space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from general statistical estimation problems. Our results guarantee convergence not just in expectation but with exponential concentration, and moreover does so using O(1) batch size. We provide consequences of our results for mean estimation and linear regression. Finally, we provide empirical corroboration of our results and algorithms via synthetic experiments for mean estimation and linear regression. 
    more » « less