We study the problem of estimating the covariance matrix of a highdimensional distribution
when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with nearoptimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a ddimensional Gaussian distribution, an \epsfraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.
more »
« less
HighDimensional Robust Mean Estimation in NearlyLinear Time
We study the fundamental problem of highdimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimensionindependent error guarantees for several families of structured distributions.
In this work, we give the first nearlylinear time algorithms for highdimensional robust mean estimation. Specifically, we focus on distributions with
(i) known covariance and subgaussian tails, and
(ii) unknown bounded covariance.
Given N samples on R^d, an \epsfraction of which may be arbitrarily corrupted, our algorithms run in time eO(Nd)/poly(\eps) and approximate the true mean within the informationtheoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times \Omega(Nd^2), for \eps= O(1)
Our algorithms rely on a natural family of SDPs parameterized by our current guess ν for the unknown mean μ. We give a winwin analysis establishing the following: either a nearoptimal solution to the primal SDP yields a good candidate for μ — independent of our current guess ν — or a nearoptimal solution to the dual SDP yields a new guess ν0
whose distance from μ is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearlylinear
time. Our approach is quite general, and we believe it can also be applied to obtain nearlylinear time algorithms for other highdimensional robust learning problems.
more »
« less
 Award ID(s):
 1704656
 NSFPAR ID:
 10090512
 Date Published:
 Journal Name:
 Proceedings of the annual ACMSIAM Symposium on Discrete Algorithms
 ISSN:
 15579468
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We study the relationship between adversarial robustness and differential privacy in highdimensional algorithmic statistics. We give the first blackbox reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental highdimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearlyoptimal polynomialtime robust estimators for the mean and covariance of highdimensional Gaussians which are based on the SumofSquares method, we design the first polynomialtime private estimators for these problems with nearlyoptimal samplesaccuracyprivacy tradeoffs. Our algorithms are also robust to a constant fraction of adversariallycorrupted samples.more » « less

We study highdimensional sparse estimation tasks in a robust setting where a constant fraction of the dataset is adversarially corrupted. Specifically, we focus on the fundamental problems of robust sparse mean estimation and robust sparse PCA. We give the first practically viable robust estimators for these problems. In more detail, our algorithms are sample and computationally efficient and achieve nearoptimal robustness guarantees. In contrast to prior provable algorithms which relied on the ellipsoid method, our algorithms use spectral techniques to iteratively remove outliers from the dataset. Our experimental evaluation on synthetic data shows that our algorithms are scalable and significantly outperform a range of previous approaches, nearly matching the best error rate without corruptions.more » « less

We present a fast, differentially private algorithm for highdimensional covarianceaware mean estimation with nearly optimal sample complexity. Only exponentialtime estimators were previously known to achieve this guarantee. Given n samples from a (sub)Gaussian distribution with unknown mean μ and covariance Σ, our (ϵ,δ)differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αϵ+dlog1/δϵ. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/\eps), where ω<2.38 is the matrix multiplication exponent.We adapt an exponentialtime approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above.Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance.Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less

We study the fundamental problem of learning the parameters of a highdimensional Gaussian in the presence of noise — where an εfraction of our samples were chosen by an adversary. We give robust estimators that achieve estimation error O(ε) in the total variation distance, which is optimal up to a universal constant that is independent of the dimension. In the case where just the mean is unknown, our robustness guarantee is optimal up to a factor of and the running time is polynomial in d and 1/ε. When both the mean and covariance are unknown, the running time is polynomial in d and quasipolynomial in 1/ε. Moreover all of our algorithms require only a polynomial number of samples. Our work shows that the same sorts of error guarantees that were established over fifty years ago in the onedimensional setting can also be achieved by efficient algorithms in highdimensional settings.more » « less