We study the problem of estimating the covariance matrix of a high-dimensional distribution
when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a d-dimensional Gaussian distribution, an \eps-fraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.
more »
« less
High-Dimensional Robust Mean Estimation in Nearly-Linear Time
We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent error guarantees for several families of structured distributions.
In this work, we give the first nearly-linear time algorithms for high-dimensional robust mean estimation. Specifically, we focus on distributions with
(i) known covariance and sub-gaussian tails, and
(ii) unknown bounded covariance.
Given N samples on R^d, an \eps-fraction of which may be arbitrarily corrupted, our algorithms run in time eO(Nd)/poly(\eps) and approximate the true mean within the information-theoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times \Omega(Nd^2), for \eps= O(1)
Our algorithms rely on a natural family of SDPs parameterized by our current guess ν for the unknown mean μ. We give a win-win analysis establishing the following: either a near-optimal solution to the primal SDP yields a good candidate for μ — independent of our current guess ν — or a near-optimal solution to the dual SDP yields a new guess ν0
whose distance from μ is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearly-linear
time. Our approach is quite general, and we believe it can also be applied to obtain nearly-linear time algorithms for other high-dimensional robust learning problems.
more »
« less
- Award ID(s):
- 1704656
- PAR ID:
- 10090512
- Date Published:
- Journal Name:
- Proceedings of the annual ACM-SIAM Symposium on Discrete Algorithms
- ISSN:
- 1557-9468
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation with nearly optimal sample complexity. Only exponential-time estimators were previously known to achieve this guarantee. Given n samples from a (sub-)Gaussian distribution with unknown mean μ and covariance Σ, our (ϵ,δ)-differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αϵ+dlog1/δϵ. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/\eps), where ω<2.38 is the matrix multiplication exponent.We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above.Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance.Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less
-
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples.more » « less
-
We study high-dimensional sparse estimation tasks in a robust setting where a constant fraction of the dataset is adversarially corrupted. Specifically, we focus on the fundamental problems of robust sparse mean estimation and robust sparse PCA. We give the first practically viable robust estimators for these problems. In more detail, our algorithms are sample and computationally efficient and achieve near-optimal robustness guarantees. In contrast to prior provable algorithms which relied on the ellipsoid method, our algorithms use spectral techniques to iteratively remove outliers from the dataset. Our experimental evaluation on synthetic data shows that our algorithms are scalable and significantly outperform a range of previous approaches, nearly matching the best error rate without corruptions.more » « less
-
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a nearly optimal fraction of adversarially-corrupted samples.more » « less