skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: List-Decodable Linear Regression
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any our algorithm takes as input a sample of n linear equations where (1- \alpha) n of the equations are {\em arbitrarily} chosen. It outputs a list that contains a linear function that is close to the truth. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution D. In particular, this gives an efficient algorithm to find an optimal size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that the true linear function has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method.  more » « less
Award ID(s):
1909204
PAR ID:
10201525
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A set of high dimensional points X ={x1,x2,…,xn}⊆Rd in isotropic position is said to be δ -anti concentrated if for every direction v, the fraction of points in X satisfying |⟨xi,v⟩|⩽δ is at most O(δ). Motivated by applications to list-decodable learning and clustering, three recent works [7], [44], [71] considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points X corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures. Unlike related efficient certificates of concentration properties that are known for wide class of distri-butions [52], the aforementioned approach has been limited only to rotationally invariant distributions (and their affine transformations) with the only prominent example being Gaussian distributions. This work presents a new (and arguably the most natural) formulation for anti- concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over Lp balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e.g., list-decodable learning and clustering, to such distributions. As in the case of previous works, our certificates are also obtained via relaxations in the sum-of-squares hierarchy. However, the nature of our argument differs significantly from prior works that formulate anti-concentration as the non-negativity of an explicit polynomial. Our argument constructs a canonical integer program for anti-concentration and analysis a SoS relaxation of it, independent of the intended application. The explicit polynomials appearing in prior works can be seen as specific dual certificates to this program. From a technical standpoint, unlike existing works that explicitly construct sum-of-squares certificates, our argument relies on duality and analyzes a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions. 
    more » « less
  2. We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string I. In particular, for every length n and every ε >0, one can in near linear time construct a string I ∈ Σ′n with |Σ′| = Oε(1), such that, indexing any string S ∈ Σn, symbol-by-symbol, with I results in a string S′ ∈ Σ″n where Σ″ = Σ × Σ′ for which edit distance computations are easy, i.e., one can compute a (1+ε)-approximation of the edit distance between S′ and any other string in O(n (log n)) time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC ‘17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP ‘18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes. 
    more » « less
  3. Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than 1/2 of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart. 
    more » « less
  4. null (Ed.)
    Nonlinear differential equations model diverse phenomena but are notoriously difficult to solve. While there has been extensive previous work on efficient quantum algorithms for linear differential equations, the linearity of quantum mechanics has limited analogous progress for the nonlinear case. Despite this obstacle, we develop a quantum algorithm for dissipative quadratic n-dimensional ordinary differential equations. Assuming R < 1 , where R is a parameter characterizing the ratio of the nonlinearity and forcing to the linear dissipation, this algorithm has complexity T 2 q   poly ( log ⁡ T , log ⁡ n , log ⁡ 1 / ϵ ) / ϵ , where T is the evolution time, ϵ is the allowed error, and q measures decay of the solution. This is an exponential improvement over the best previous quantum algorithms, whose complexity is exponential in T. While exponential decay precludes efficiency, driven equations can avoid this issue despite the presence of dissipation. Our algorithm uses the method of Carleman linearization, for which we give a convergence theorem. This method maps a system of nonlinear differential equations to an infinite-dimensional system of linear differential equations, which we discretize, truncate, and solve using the forward Euler method and the quantum linear system algorithm. We also provide a lower bound on the worst-case complexity of quantum algorithms for general quadratic differential equations, showing that the problem is intractable for R ≥ 2 . Finally, we discuss potential applications, showing that the R < 1 condition can be satisfied in realistic epidemiological models and giving numerical evidence that the method may describe a model of fluid dynamics even for larger values of R. 
    more » « less
  5. Given a set of points $$P = (P^+ \sqcup P^-) \subset \mathbb{R}^d$$ for some constant $$d$$ and a supply function $$\mu:P\to \mathbb{R}$$ such that $$\mu(p) > 0~\forall p \in P^+$$, $$\mu(p) < 0~\forall p \in P^-$$, and $$\sum_{p\in P}{\mu(p)} = 0$$, the geometric transportation problem asks one to find a transportation map $$\tau: P^+\times P^-\to \mathbb{R}_{\ge 0}$$ such that $$\sum_{q\in P^-}{\tau(p, q)} = \mu(p)~\forall p \in P^+$$, $$\sum_{p\in P^+}{\tau(p, q)} = -\mu(q) \forall q \in P^-$$, and the weighted sum of Euclidean distances for the pairs $$\sum_{(p,q)\in P^+\times P^-}\tau(p, q)\cdot ||q-p||_2$$ is minimized. We present the first deterministic algorithm that computes, in near-linear time, a transportation map whose cost is within a $$(1 + \varepsilon)$$ factor of optimal. More precisely, our algorithm runs in $$O(n\varepsilon^{-(d+2)}\log^5{n}\log{\log{n}})$$ time for any constant $$\varepsilon > 0$$. While a randomized $$n\varepsilon^{-O(d)}\log^{O(d)}{n}$$ time algorithm for this problem was discovered in the last few years, all previously known deterministic $$(1 + \varepsilon)$$-approximation algorithms run in~$$\Omega(n^{3/2})$$ time. A similar situation existed for geometric bipartite matching, the special case of geometric transportation where all supplies are unit, until a deterministic $$n\varepsilon^{-O(d)}\log^{O(d)}{n}$$ time $$(1 + \varepsilon)$$-approximation algorithm was presented at STOC 2022. Surprisingly, our result is not only a generalization of the bipartite matching one to arbitrary instances of geometric transportation, but it also reduces the running time for all previously known $$(1 + \varepsilon)$$-approximation algorithms, randomized or deterministic, even for geometric bipartite matching. In particular, we give the first $$(1 + \varepsilon)$$-approximate deterministic algorithm for geometric bipartite matching and the first $$(1 + \varepsilon)$$-approximate deterministic or randomized algorithm for geometric transportation with no dependence on $$d$$ in the exponent of the running time's polylog. As an additional application of our main ideas, we also give the first randomized near-linear $$O(\varepsilon^{-2} m \log^{O(1)} n)$$ time $$(1 + \varepsilon)$$-approximation algorithm for the uncapacitated minimum cost flow (transshipment) problem in undirected graphs with arbitrary \emph{real} edge costs. 
    more » « less