skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robust and Tuning-Free Sparse Linear Regression via Square-Root Slope
We consider the high-dimensional linear regression model and assume that a fraction of the measurements are altered by an adversary with complete knowledge of the data and the underlying distribution. We are interested in a scenario where dense additive noise is heavy-tailed while the measurement vectors follow a sub-Gaussian distribution. Within this framework, we establish minimax lower bounds for the performance of an arbitrary estimator that depend on the the fraction of corrupted observations as well as the tail behavior of the additive noise. Moreover, we design a modification of the so-called Square-Root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, and satisfies performance guarantees in the form of sub-Gaussian deviation inequalities that match the lower error bounds, up to logarithmic factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the additive noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that may be of independent interest.  more » « less
Award ID(s):
2045068 1908905
PAR ID:
10376934
Author(s) / Creator(s):
; ;
Publisher / Repository:
SIAM
Date Published:
Journal Name:
SIAM Journal on Mathematics of Data Science
Volume:
6
Issue:
2
ISSN:
2577-0187
Subject(s) / Keyword(s):
robust inference sub-Gaussian deviations Slope sparse linear regression
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from $$d$$-dimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth and median estimators and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near optimal when data are i.i.d. and when the fraction of ‘low-noise’ distributions is as small as $$\varOmega \left (\frac{d \log n}{n}\right )$$, where $$n$$ is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points. 
    more » « less
  2. We consider the statistical connection between the quantized representation of a high dimensional signal X using a random spherical code and the observation of X under an additive white Gaussian noise (AWGN). We show that given X, the conditional Wasserstein distance between its bitrate-R quantized version and its observation under AWGN of signal-to-noise ratio 2^{2R - 1} is sub-linear in the problem dimension. We then utilize this fact to connect the mean squared error (MSE) attained by an estimator based on an AWGN-corrupted version of X to the MSE attained by the same estimator when fed with its bitrate-R quantized version. 
    more » « less
  3. Convolution and matched filtering are often used to detect a known signal in the presence of noise. The probability of detection and probability of missed detection are well known and widely used statistics. Oftentimes we are not only interested in the probability of detecting a signal but also accurately estimating when the signal occurred and the error statistics associated with that time measurement. Accurately representing the timing error is important for geolocation schemes, such as Time of Arrival (TOA) and Time Difference of Arrival (TDOA), as well as other applications. The Cramér Rao Lower Bound (CRLB) and other, tighter, bounds have been calculated for the error variance on Time of Arrival estimators. However, achieving these bounds requires an amount of interpolation be performed on the signal of interest that may be greater than computational constraints allow. Furthermore, at low Signal to Noise Ratios (SNRs), the probability distribution for the error is non-Gaussian and depends on the shape of the signal of interest. In this paper we introduce a method of characterizing the localization accuracy of the matched filtering operation when used to detect a discrete signal in Additive White Gaussian Noise (AWGN) without additional interpolation. The actual localization accuracy depends on the shape of the signal that is being detected. We develop a statistical method for analyzing the localization error probability mass function for arbitrary signal shapes at any SNR. Finally, we use our proposed analysis method to calculate the error probability mass functions for a few signals commonly used in detection scenarios. These illustrative results serve as examples of the kinds of statistical results that can be generated using our proposed method. The illustrative results demonstrate our method’s unique ability to calculate the non-Gaussian, and signal shape dependent, error distribution at low Signal to Noise Ratios. The error variance calculated using the proposed method is shown to closely track simulation results, deviating from the Cramér Rao Lower Bound at low Signal to Noise Ratios. 
    more » « less
  4. The goal of this note is to present a modification of the popular median of means estimator that achieves sub-Gaussian deviation bounds with nearly optimal constants under minimal assumptions on the underlying distribution. We build on the recent work on the topic and prove that desired guarantees can be attained under weaker requirements. 
    more » « less
  5. null (Ed.)
    Finite-sample bounds on the accuracy of Bhattacharya’s plug-in estimator for Fisher information are derived. These bounds are further improved by introducing a clipping step that allows for better control over the score function. This leads to superior upper bounds on the rates of convergence, albeit under slightly different regularity conditions. The performance bounds on both estimators are evaluated for the practically relevant case of a random variable contaminated by Gaussian noise. Moreover, using Brown’s identity, two corresponding estimators of the minimum mean-square error are proposed. 
    more » « less