We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent error guarantees for several families of structured distributions. In this work, we give the first nearly-linear time algorithms for high-dimensional robust mean estimation. Specifically, we focus on distributions with (i) known covariance and sub-gaussian tails, and (ii) unknown bounded covariance. Given N samples on R^d, an \eps-fraction of which may be arbitrarily corrupted, our algorithms run in time eO(Nd)/poly(\eps) and approximate the true mean within the information-theoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times \Omega(Nd^2), for \eps= O(1) Our algorithms rely on a natural family of SDPs parameterized by our current guess ν for the unknown mean μ. We give a win-win analysis establishing the following: either a near-optimal solution to the primal SDP yields a good candidate for μ — independent of our current guess ν — or a near-optimal solution to the dual SDP yields a new guess ν0 whose distance from μ is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearly-linear time. Our approach is quite general, and we believe it can also be applied to obtain nearly-linear time algorithms for other high-dimensional robust learning problems.
more »
« less
Towards optimal running timesfor optimal transport
We provide faster algorithms for approximating the optimal transport distance, e.g. earth mover's distance, between two discrete probability distributions on n elements. We present two algorithms that compute couplings between marginal distributions with an expected transportation cost that is within an additive ϵ of optimal in time O(n^2/eps); one algorithm is straightforward to parallelize and implementable in depth O(1/eps). Further, we show that additional improvements on our results must be coupled with breakthroughs in algorithmic graph theory.
more »
« less
- Award ID(s):
- 1915967
- PAR ID:
- 10483194
- Publisher / Repository:
- Operations Research Letters
- Date Published:
- Journal Name:
- Operations Research Letters
- Volume:
- 52
- Issue:
- C
- ISSN:
- 0167-6377
- Page Range / eLocation ID:
- 107054
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary Sequential Monte Carlo algorithms are widely accepted as powerful computational tools for making inference with dynamical systems. A key step in sequential Monte Carlo is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been used in practice, including multinomial resampling, residual resampling, optimal resampling, stratified resampling and optimal transport resampling. In one-dimensional cases, we show that optimal transport resampling is equivalent to stratified resampling on the sorted particles, and both strategies minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions. For general $$d$$-dimensional cases, we show that if the particles are first sorted using the Hilbert curve, the variance of stratified resampling is $$O(m^{-(1+2/d)})$$, an improvement over the best previously known rate of $$O(m^{-(1+1/d)})$$, where $$m$$ is the number of resampled particles. We show that this improved rate is optimal for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost-sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these results, we show that for dimension $d>1$ the mean square error of sequential quasi-Monte Carlo with $$n$$ particles can be $$O(n^{-1-4/\{d(d+4)\}})$$ if Hilbert curve resampling is used and a specific low-discrepancy set is chosen. To our knowledge, this is the first known convergence rate lower than $$o(n^{-1})$$.more » « less
-
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a d-dimensional Gaussian distribution, an \eps-fraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.more » « less
-
We consider the problem of preprocessing a weighted directed planar graph in order to quickly answer exact distance queries. The main tension in this problem is between space S and query time Q , and since the mid-1990s all results had polynomial time-space tradeoffs, e.g., Q = ~ Θ( n/√ S ) or Q = ~Θ( n 5/2 /S 3/2 ). In this article we show that there is no polynomial tradeoff between time and space and that it is possible to simultaneously achieve almost optimal space n 1+ o (1) and almost optimal query time n o (1) . More precisely, we achieve the following space-time tradeoffs: n 1+ o (1) space and log 2+ o (1) n query time, n log 2+ o (1) n space and n o (1) query time, n 4/3+ o (1) space and log 1+ o (1) n query time. We reduce a distance query to a variety of point location problems in additively weighted Voronoi diagrams and develop new algorithms for the point location problem itself using several partially persistent dynamic tree data structures.more » « less
-
We study the problem of efficiently estimating the effect of an intervention on a single variable using observational samples. Our goal is to give algorithms with polynomial time and sample complexity in a non-parametric setting. Tian and Pearl (AAAI ’02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose 𝒫 is a causal model on a set V of n observable variables with respect to a given causal graph G, and let do(x) be an identifiable intervention on a variable X. We show that assuming that G has bounded in-degree and bounded c-components (k) and that the observational distribution satisfies a strong positivity condition: (i) [Evaluation] There is an algorithm that outputs with probability 2/3 an evaluator for a distribution P^ that satisfies TV(P(V | do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The evaluator can return in O(n) time the probability P^(v) for any assignment v to V. (ii) [Sampling] There is an algorithm that outputs with probability 2/3 a sampler for a distribution P^ that satisfies TV(P(V | do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The sampler returns an iid sample from P^ with probability 1 in O(n) time. We extend our techniques to estimate P(Y | do(x)) for a subset Y of variables of interest. We also show lower bounds for the sample complexity, demonstrating that our sample complexity has optimal dependence on the parameters n and eps, as well as if k=1 on the strong positivity parameter.more » « less