skip to main content

Title: Fast Regression for Structured Inputs
We study the $\ell_p$ regression problem, which requires finding $\mathbf{x}\in\mathbb R^{d}$ that minimizes $\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_p$ for a matrix $\mathbf{A}\in\mathbb R^{n \times d}$ and response vector $\mathbf{b}\in\mathbb R^{n}$. There has been recent interest in developing subsampling methods for this problem that can outperform standard techniques when $n$ is very large. However, all known subsampling approaches have run time that depends exponentially on $p$, typically, $d^{\mathcal{O}(p)}$, which can be prohibitively expensive. We improve on this work by showing that for a large class of common \emph{structured matrices}, such as combinations of low-rank matrices, sparse matrices, and Vandermonde matrices, there are subsampling based methods for $\ell_p$ regression that depend polynomially on $p$. For example, we give an algorithm for $\ell_p$ regression on Vandermonde matrices that runs in time $\mathcal{O}(n\log^3 n+(dp^2)^{0.5+\omega}\cdot\text{polylog}\,n)$, where $\omega$ is the exponent of matrix multiplication. The polynomial dependence on $p$ crucially allows our algorithms to extend naturally to efficient algorithms for $\ell_\infty$ regression, via approximation of $\ell_\infty$ by $\ell_{\mathcal{O}(\log n)}$. Of practical interest, we also develop a new subsampling algorithm for $\ell_p$ regression for arbitrary matrices, which is simpler than previous approaches for $p \ge 4$.
Authors:
; ; ; ;
Award ID(s):
2046235
Publication Date:
NSF-PAR ID:
10326695
Journal Name:
International Conference on Learning Representations (ICLR)
Sponsoring Org:
National Science Foundation
More Like this
  1. An \ell _p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A , \[ \Pr _{\Pi }[\textrm {for all }x, \ \Vert Ax\Vert _p \le \Vert \Pi Ax\Vert _p \le \kappa \Vert Ax\Vert _p] \ge 9/10,\] where r is the dimension of the embedding, \kappa is the distortion of the embedding, and for an n -dimensional vector y , \Vert y\Vert _p = (\sum _{i=1}^n |y_i|^p)^{1/p} is the \ell _p -norm. Another important property is the sparsity of \Pi , that is, the maximum number of non-zero entries per column, as this determines the running time of computing \Pi A . While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 \le p \lt 2 , much less was known. In this article, we obtain nearly optimal tradeoffs for \ell _1 oblivious subspace embeddings, as well as new tradeoffs for 1 \lt p \lt 2 . Our main results are as follows: (1) We show for every 1 \le p \lt 2 , any oblivious subspace embedding with dimension r has distortion \[more »\kappa = \Omega \left(\frac{1}{\left(\frac{1}{d}\right)^{1 / p} \log ^{2 / p}r + \left(\frac{r}{n}\right)^{1 / p - 1 / 2}}\right).\] When r = {\operatorname{poly}}(d) \ll n in applications, this gives a \kappa = \Omega (d^{1/p}\log ^{-2/p} d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 is optimal up to {\operatorname{poly}}(\log (d)) factors. (2) We give sparse oblivious subspace embeddings for every 1 \le p \lt 2 . Importantly, for p = 1 , we achieve r = O(d \log d) , \kappa = O(d \log d) and s = O(\log d) non-zero entries per column. The best previous construction with s \le {\operatorname{poly}}(\log d) is due to Woodruff and Zhang (COLT, 2013), giving \kappa = \Omega (d^2 {\operatorname{poly}}(\log d)) or \kappa = \Omega (d^{3/2} \sqrt {\log n} \cdot {\operatorname{poly}}(\log d)) and r \ge d \cdot {\operatorname{poly}}(\log d) ; in contrast our r = O(d \log d) and \kappa = O(d \log d) are optimal up to {\operatorname{poly}}(\log (d)) factors even for dense matrices. We also give (1) \ell _p oblivious subspace embeddings with an expected 1+\varepsilon number of non-zero entries per column for arbitrarily small \varepsilon \gt 0 , and (2) the first oblivious subspace embeddings for 1 \le p \lt 2 with O(1) -distortion and dimension independent of n . Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise \ell _p low-rank approximation. Our results give improved algorithms for these applications.« less
  2. We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speed-ups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{-10} + sqrt{n} epsilon^{-12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{-1}), with B an upper bound on the trace-norm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(logmore »m,log n,r,epsilon^{-1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for low-rank Hamiltonians, given quantum states encoding these Hamiltonians, with a poly-logarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest.« less
  3. Abstract We present efficient algorithms for counting points on a smooth plane quartic curve X modulo a prime p . We address both the case where X is defined over  $${\mathbb {F}}_p$$ F p and the case where X is defined over $${\mathbb {Q}}$$ Q and p is a prime of good reduction. We consider two approaches for computing $$\#X({\mathbb {F}}_p)$$ # X ( F p ) , one which runs in $$O(p\log p\log \log p)$$ O ( p log p log log p ) time using $$O(\log p)$$ O ( log p ) space and one which runs in $$O(p^{1/2}\log ^2p)$$ O ( p 1 / 2 log 2 p ) time using $$O(p^{1/2}\log p)$$ O ( p 1 / 2 log p ) space. Both approaches yield algorithms that are faster in practice than existing methods. We also present average polynomial-time algorithms for $$X/{\mathbb {Q}}$$ X / Q that compute $$\#X({\mathbb {F}}_p)$$ # X ( F p ) for good primes $$p\leqslant N$$ p ⩽ N in $$O(N\log ^3 N)$$ O ( N log 3 N ) time using O ( N ) space. These are the first practical implementations of average polynomial-time algorithms for curves that aremore »not cyclic covers of $${\mathbb {P}}^1$$ P 1 , which in combination with previous results addresses all curves of genus $$g\leqslant 3$$ g ⩽ 3 . Our algorithms also compute Cartier–Manin/Hasse–Witt matrices that may be of independent interest.« less
  4. Abstract We show how to construct a $$(1+\varepsilon )$$ ( 1 + ε ) -spanner over a set $${P}$$ P of n points in $${\mathbb {R}}^d$$ R d that is resilient to a catastrophic failure of nodes. Specifically, for prescribed parameters $${\vartheta },\varepsilon \in (0,1)$$ ϑ , ε ∈ ( 0 , 1 ) , the computed spanner $${G}$$ G has $$\begin{aligned} {{\mathcal {O}}}\bigl (\varepsilon ^{-O(d)} {\vartheta }^{-6} n(\log \log n)^6 \log n \bigr ) \end{aligned}$$ O ( ε - O ( d ) ϑ - 6 n ( log log n ) 6 log n ) edges. Furthermore, for any k , and any deleted set $${{B}}\subseteq {P}$$ B ⊆ P of k points, the residual graph $${G}\setminus {{B}}$$ G \ B is a $$(1+\varepsilon )$$ ( 1 + ε ) -spanner for all the points of $${P}$$ P except for $$(1+{\vartheta })k$$ ( 1 + ϑ ) k of them. No previous constructions, beyond the trivial clique with $${{\mathcal {O}}}(n^2)$$ O ( n 2 ) edges, were known with this resilience property (i.e., only a tiny additional fraction of vertices, $$\vartheta |B|$$ ϑ | B | , lose their distance preserving connectivity). Our construction works by first solvingmore »the exact problem in one dimension, and then showing a surprisingly simple and elegant construction in higher dimensions, that uses the one-dimensional construction in a black-box fashion.« less
  5. Abstract We obtain new quantitative estimates on Weyl Law remainders under dynamical assumptions on the geodesic flow. On a smooth compact Riemannian manifold ( M ,  g ) of dimension n , let $$\Pi _\lambda $$ Π λ denote the kernel of the spectral projector for the Laplacian, $$\mathbb {1}_{[0,\lambda ^2]}(-\Delta _g)$$ 1 [ 0 , λ 2 ] ( - Δ g ) . Assuming only that the set of near periodic geodesics over $${W}\subset M$$ W ⊂ M has small measure, we prove that as $$\lambda \rightarrow \infty $$ λ → ∞ $$\begin{aligned} \int _{{W}} \Pi _\lambda (x,x)dx=(2\pi )^{-n}{{\,\textrm{vol}\,}}_{_{{\mathbb {R}}^n}}\!(B){{\,\textrm{vol}\,}}_g({W})\,\lambda ^n+O\Big (\frac{\lambda ^{n-1}}{\log \lambda }\Big ), \end{aligned}$$ ∫ W Π λ ( x , x ) d x = ( 2 π ) - n vol R n ( B ) vol g ( W ) λ n + O ( λ n - 1 log λ ) , where B is the unit ball. One consequence of this result is that the improved remainder holds on all product manifolds, in particular giving improved estimates for the eigenvalue counting function in the product setup. Our results also include logarithmic gains on asymptotics for the off-diagonal spectral projector $$\Pimore »_\lambda (x,y)$$ Π λ ( x , y ) under the assumption that the set of geodesics that pass near both x and y has small measure, and quantitative improvements for Kuznecov sums under non-looping type assumptions. The key technique used in our study of the spectral projector is that of geodesic beams.« less