We study the relationship between adversarial robustness and differential privacy in highdimensional algorithmic statistics. We give the first blackbox reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental highdimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearlyoptimal polynomialtime robust estimators for the mean and covariance of highdimensional Gaussians which are based on the SumofSquares method, we design the first polynomialtime private estimators for these problems with nearlyoptimal samplesaccuracyprivacy tradeoffs. Our algorithms are also robust to a nearly optimal fraction of adversariallycorrupted samples.
more »
« less
Robustness Implies Privacy in Statistical Estimation
We study the relationship between adversarial robustness and differential privacy in highdimensional algorithmic statistics. We give the first blackbox reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental highdimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearlyoptimal polynomialtime robust estimators for the mean and covariance of highdimensional Gaussians which are based on the SumofSquares method, we design the first polynomialtime private estimators for these problems with nearlyoptimal samplesaccuracyprivacy tradeoffs. Our algorithms are also robust to a constant fraction of adversariallycorrupted samples.
more »
« less
 Award ID(s):
 2238080
 NSFPAR ID:
 10487323
 Publisher / Repository:
 ACM
 Date Published:
 Journal Name:
 55th Annual ACM Symposium on Theory of Computing (STOC)
 Format(s):
 Medium: X
 Location:
 Orlando, Florida
 Sponsoring Org:
 National Science Foundation
More Like this


We present a fast, differentially private algorithm for highdimensional covarianceaware mean estimation with nearly optimal sample complexity. Only exponentialtime estimators were previously known to achieve this guarantee. Given n samples from a (sub)Gaussian distribution with unknown mean μ and covariance Σ, our (ε,δ)differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αε+dlog1/δε. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/ε), where ω<2.38 is the matrix multiplication exponent. We adapt an exponentialtime approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above. Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance. Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less

We present a fast, differentially private algorithm for highdimensional covarianceaware mean estimation with nearly optimal sample complexity. Only exponentialtime estimators were previously known to achieve this guarantee. Given n samples from a (sub)Gaussian distribution with unknown mean μ and covariance Σ, our (ϵ,δ)differentially private estimator produces μ~ such that ∥μ−μ~∥Σ≤α as long as n≳dα2+dlog1/δ√αϵ+dlog1/δϵ. The Mahalanobis error metric ∥μ−μ^∥Σ measures the distance between μ^ and μ relative to Σ; it characterizes the error of the sample mean. Our algorithm runs in time O~(ndω−1+nd/\eps), where ω<2.38 is the matrix multiplication exponent.We adapt an exponentialtime approach of Brown, Gaboardi, Smith, Ullman, and Zakynthinou (2021), giving efficient variants of stable mean and covariance estimation subroutines that also improve the sample complexity to the nearly optimal bound above.Our stable covariance estimator can be turned to private covariance estimation for unrestricted subgaussian distributions. With n≳d3/2 samples, our estimate is accurate in spectral norm. This is the first such algorithm using n=o(d2) samples, answering an open question posed by Alabi et al. (2022). With n≳d2 samples, our estimate is accurate in Frobenius norm. This leads to a fast, nearly optimal algorithm for private learning of unrestricted Gaussian distributions in TV distance.Duchi, Haque, and Kuditipudi (2023) obtained similar results independently and concurrently.more » « less

We study the fundamental problem of learning the parameters of a highdimensional Gaussian in the presence of noise — where an εfraction of our samples were chosen by an adversary. We give robust estimators that achieve estimation error O(ε) in the total variation distance, which is optimal up to a universal constant that is independent of the dimension. In the case where just the mean is unknown, our robustness guarantee is optimal up to a factor of and the running time is polynomial in d and 1/ε. When both the mean and covariance are unknown, the running time is polynomial in d and quasipolynomial in 1/ε. Moreover all of our algorithms require only a polynomial number of samples. Our work shows that the same sorts of error guarantees that were established over fifty years ago in the onedimensional setting can also be achieved by efficient algorithms in highdimensional settings.more » « less

We study the fundamental problem of highdimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimensionindependent error guarantees for several families of structured distributions. In this work, we give the first nearlylinear time algorithms for highdimensional robust mean estimation. Specifically, we focus on distributions with (i) known covariance and subgaussian tails, and (ii) unknown bounded covariance. Given N samples on R^d, an \epsfraction of which may be arbitrarily corrupted, our algorithms run in time eO(Nd)/poly(\eps) and approximate the true mean within the informationtheoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times \Omega(Nd^2), for \eps= O(1) Our algorithms rely on a natural family of SDPs parameterized by our current guess ν for the unknown mean μ. We give a winwin analysis establishing the following: either a nearoptimal solution to the primal SDP yields a good candidate for μ — independent of our current guess ν — or a nearoptimal solution to the dual SDP yields a new guess ν0 whose distance from μ is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearlylinear time. Our approach is quite general, and we believe it can also be applied to obtain nearlylinear time algorithms for other highdimensional robust learning problems.more » « less