Fast Regression for Structured Inputs

Meyer, Raphael; Musco, Cameron; Musco, Christopher; Woodruff, David P.; Zhou, Samson

We study the $$\ell_p$$ regression problem, which requires finding $$\mathbf{x}\in\mathbb R^{d}$$ that minimizes $$\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_p$$ for a matrix $$\mathbf{A}\in\mathbb R^{n \times d}$$ and response vector $$\mathbf{b}\in\mathbb R^{n}$$. There has been recent interest in developing subsampling methods for this problem that can outperform standard techniques when $$n$$ is very large. However, all known subsampling approaches have run time that depends exponentially on $$p$$, typically, $$d^{\mathcal{O}(p)}$$, which can be prohibitively expensive. We improve on this work by showing that for a large class of common \emph{structured matrices}, such as combinations of low-rank matrices, sparse matrices, and Vandermonde matrices, there are subsampling based methods for $$\ell_p$$ regression that depend polynomially on $$p$$. For example, we give an algorithm for $$\ell_p$$ regression on Vandermonde matrices that runs in time $$\mathcal{O}(n\log^3 n+(dp^2)^{0.5+\omega}\cdot\text{polylog}\,n)$$, where $$\omega$$ is the exponent of matrix multiplication. The polynomial dependence on $$p$$ crucially allows our algorithms to extend naturally to efficient algorithms for $$\ell_\infty$$ regression, via approximation of $$\ell_\infty$$ by $$\ell_{\mathcal{O}(\log n)}$$. Of practical interest, we also develop a new subsampling algorithm for $$\ell_p$$ regression for arbitrary matrices, which is simpler than previous approaches for $$p \ge 4$$.

More Like this