skip to main content


Title: Leverage Score Sampling for Faster Accelerated Regression and ERM
Given a matrix A∈ℝn×d and a vector b∈ℝd, we show how to compute an ϵ-approximate solution to the regression problem minx∈ℝd12‖Ax−b‖22 in time Õ ((n+d⋅κsum‾‾‾‾‾‾‾√)⋅s⋅logϵ−1) where κsum=tr(A⊤A)/λmin(ATA) and s is the maximum number of non-zero entries in a row of A. Our algorithm improves upon the previous best running time of Õ ((n+n⋅κsum‾‾‾‾‾‾‾√)⋅s⋅logϵ−1). We achieve our result through a careful combination of leverage score sampling techniques, proximal point methods, and accelerated coordinate descent. Our method not only matches the performance of previous methods, but further improves whenever leverage scores of rows are small (up to polylogarithmic factors). We also provide a non-linear generalization of these results that improves the running time for solving a broader class of ERM problems.  more » « less
Award ID(s):
1740822
NSF-PAR ID:
10196969
Author(s) / Creator(s):
Date Published:
Journal Name:
Algorithmic Learning Theory
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Megow, Nicole ; Smith, Adam (Ed.)
    The celebrated IP = PSPACE Theorem gives an efficient interactive proof for any bounded-space algorithm. In this work we study interactive proofs for non-deterministic bounded space computations. While Savitch’s Theorem shows that nondeterministic bounded-space algorithms can be simulated by deterministic bounded-space algorithms, this simulation has a quadratic overhead. We give interactive protocols for nondeterministic algorithms directly to get faster verifiers. More specifically, for any non-deterministic space S algorithm, we construct an interactive proof in which the verifier runs in time Õ(n+S²). This improves on the best previous bound of Õ(n+S³) and matches the result for deterministic space bounded algorithms, up to polylog(S) factors. We further generalize to alternating bounded space algorithms. For any language L decided by a time T, space S algorithm that uses d alternations, we construct an interactive proof in which the verifier runs in time Õ(n + S log(T) + S d) and the prover runs in time 2^O(S). For d = O(log(T)), this matches the best known interactive proofs for deterministic algorithms, up to polylog(S) factors, and improves on the previous best verifier time for nondeterministic algorithms by a factor of log(T). We also improve the best prior verifier time for unbounded alternations by a factor of S. Using known connections of bounded alternation algorithms to bounded depth circuits, we also obtain faster verifiers for bounded depth circuits with unbounded fan-in. 
    more » « less
  2. We study the bit complexity of two related fundamental computational problems in linear algebra and control theory. Our results are: (1) An Õ(n^{ω+3}a+n⁴a²+n^ωlog(1/ε)) time algorithm for finding an ε-approximation to the Jordan Normal form of an integer matrix with a-bit entries, where ω is the exponent of matrix multiplication. (2) An Õ(n⁶d⁶a+n⁴d⁴a²+n³d³log(1/ε)) time algorithm for ε-approximately computing the spectral factorization P(x) = Q^*(x)Q(x) of a given monic n× n rational matrix polynomial of degree 2d with rational a-bit coefficients having a-bit common denominators, which satisfies P(x)⪰0 for all real x. The first algorithm is used as a subroutine in the second one. Despite its being of central importance, polynomial complexity bounds were not previously known for spectral factorization, and for Jordan form the best previous best running time was an unspecified polynomial in n of degree at least twelve [Cai, 1994]. Our algorithms are simple and judiciously combine techniques from numerical and symbolic computation, yielding significant advantages over either approach by itself. 
    more » « less
  3. An \ell _p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A , \[ \Pr _{\Pi }[\textrm {for all }x, \ \Vert Ax\Vert _p \le \Vert \Pi Ax\Vert _p \le \kappa \Vert Ax\Vert _p] \ge 9/10,\] where r is the dimension of the embedding, \kappa is the distortion of the embedding, and for an n -dimensional vector y , \Vert y\Vert _p = (\sum _{i=1}^n |y_i|^p)^{1/p} is the \ell _p -norm. Another important property is the sparsity of \Pi , that is, the maximum number of non-zero entries per column, as this determines the running time of computing \Pi A . While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 \le p \lt 2 , much less was known. In this article, we obtain nearly optimal tradeoffs for \ell _1 oblivious subspace embeddings, as well as new tradeoffs for 1 \lt p \lt 2 . Our main results are as follows: (1) We show for every 1 \le p \lt 2 , any oblivious subspace embedding with dimension r has distortion \[ \kappa = \Omega \left(\frac{1}{\left(\frac{1}{d}\right)^{1 / p} \log ^{2 / p}r + \left(\frac{r}{n}\right)^{1 / p - 1 / 2}}\right).\] When r = {\operatorname{poly}}(d) \ll n in applications, this gives a \kappa = \Omega (d^{1/p}\log ^{-2/p} d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 is optimal up to {\operatorname{poly}}(\log (d)) factors. (2) We give sparse oblivious subspace embeddings for every 1 \le p \lt 2 . Importantly, for p = 1 , we achieve r = O(d \log d) , \kappa = O(d \log d) and s = O(\log d) non-zero entries per column. The best previous construction with s \le {\operatorname{poly}}(\log d) is due to Woodruff and Zhang (COLT, 2013), giving \kappa = \Omega (d^2 {\operatorname{poly}}(\log d)) or \kappa = \Omega (d^{3/2} \sqrt {\log n} \cdot {\operatorname{poly}}(\log d)) and r \ge d \cdot {\operatorname{poly}}(\log d) ; in contrast our r = O(d \log d) and \kappa = O(d \log d) are optimal up to {\operatorname{poly}}(\log (d)) factors even for dense matrices. We also give (1) \ell _p oblivious subspace embeddings with an expected 1+\varepsilon number of non-zero entries per column for arbitrarily small \varepsilon \gt 0 , and (2) the first oblivious subspace embeddings for 1 \le p \lt 2 with O(1) -distortion and dimension independent of n . Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise \ell _p low-rank approximation. Our results give improved algorithms for these applications. 
    more » « less
  4. null (Ed.)
    We study the communication cost (or message complexity) of fundamental distributed symmetry breaking problems, namely, coloring and MIS. While significant progress has been made in understanding and improving the running time of such problems, much less is known about the message complexity of these problems. In fact, all known algorithms need at least Ω(m) communication for these problems, where m is the number of edges in the graph. We addressthe following question in this paper: can we solve problems such as coloring and MIS using sublinear, i.e., o(m) communication, and if sounder what conditions? In a classical result, Awerbuch, Goldreich, Peleg, and Vainish [JACM 1990] showed that fundamental global problems such asbroadcast and spanning tree construction require at least o(m) messages in the KT-1 Congest model (i.e., Congest model in which nodes have initial knowledge of the neighbors' ID's) when algorithms are restricted to be comparison-based (i.e., algorithms inwhich node ID's can only be compared). Thirty five years after this result, King, Kutten, and Thorup [PODC 2015] showed that onecan solve the above problems using Õ(n) messages (n is the number of nodes in the graph) in Õ(n) rounds in the KT-1 Congest model if non-comparison-based algorithms are permitted. An important implication of this result is that one can use the synchronous nature of the KT-1 Congest model, using silence to convey information,and solve any graph problem using non-comparison-based algorithms with Õ(n) messages, but this takes an exponential number of rounds. In the asynchronous model, even this is not possible. In contrast, much less is known about the message complexity of local symmetry breaking problems such as coloring and MIS. Our paper fills this gap by presenting the following results. Lower bounds: In the KT-1 CONGEST model, we show that any comparison-based algorithm, even a randomized Monte Carlo algorithm with constant success probability, requires Ω(n 2) messages in the worst case to solve either (△ + 1)-coloring or MIS, regardless of the number of rounds. We also show that Ω(n) is a lower bound on the number ofmessages for any (△ + 1)-coloring or MIS algorithm, even non-comparison-based, and even with nodes having initial knowledge of up to a constant radius. Upper bounds: In the KT-1 CONGEST model, we present the following randomized non-comparison-based algorithms for coloring that, with high probability, use o(m) messages and run in polynomially many rounds.(a) A (△ + 1)-coloring algorithm that uses Õ(n1.5) messages, while running in Õ(D + √ n) rounds, where D is the graph diameter. Our result also implies an asynchronous algorithm for (△ + 1)-coloring with the same message bound but running in Õ(n) rounds. (b) For any constantε > 0, a (1+ε)△-coloring algorithm that uses Õ(n/ε 2 ) messages, while running in Õ(n) rounds. If we increase our input knowledge slightly to radius 2, i.e.,in the KT-2 CONGEST model, we obtain:(c) A randomized comparison-based MIS algorithm that uses Õ(n 1.5) messages. while running in Õ( √n) rounds. While our lower bound results can be viewed as counterparts to the classical result of Awerbuch, Goldreich, Peleg, and Vainish [JACM 90], but for local problems, our algorithms are the first-known algorithms for coloring and MIS that take o(m) messages and run in polynomially many rounds. 
    more » « less
  5. We present an algorithm that, with high probability, generates a random spanning tree from an edge-weighted undirected graph in \Otil(n^{5/3 }m^{1/3}) time\footnote{The \Otil(\cdot) notation hides \poly(\log n) factors}. The tree is sampled from a distribution where the probability of each tree is proportional to the product of its edge weights. This improves upon the previous best algorithm due to Colbourn et al. that runs in matrix multiplication time, O(n^\omega). For the special case of unweighted graphs, this improves upon the best previously known running time of \tilde{O}(\min\{n^{\omega},m\sqrt{n},m^{4/3}\}) for m >> n^{7/4} (Colbourn et al. '96, Kelner-Madry '09, Madry et al. '15). The effective resistance metric is essential to our algorithm, as in the work of Madry et al., but we eschew determinant-based and random walk-based techniques used by previous algorithms. Instead, our algorithm is based on Gaussian elimination, and the fact that effective resistance is preserved in the graph resulting from eliminating a subset of vertices (called a Schur complement). As part of our algorithm, we show how to compute \eps-approximate effective resistances for a set SS of vertex pairs via approximate Schur complements in \Otil(m+(n + |S|)\eps^{-2}) time, without using the Johnson-Lindenstrauss lemma which requires \Otil( \min\{(m + |S|)\eps^{-2}, m+n\eps^{-4} +|S|\eps^{-2}\}) time. We combine this approximation procedure with an error correction procedure for handing edges where our estimate isn't sufficiently accurate. 
    more » « less