We study the problem of robust multivariate polynomial regression: let p\colon\mathbb{R}^n\to\mathbb{R} be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R} that are noisy versions of (\mathbf{x}_i,p(\mathbf{x}_i)). More precisely, each \mathbf{x}_i is sampled independently from some distribution \chi on [-1,1]^n, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most \rho < 1/2, and otherwise satisfies |y_i-p(\mathbf{x}_i)|\leq\sigma. The goal is to output a polynomial \hat{p}, of degree at most d in each variable, within an \ell_\infty-distance of at most O(\sigma) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n=1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(d^n\log d), where the hidden constant depends on n, if \chi is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}\log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(\sigma), and the run-time depends on \log(1/\sigma). In the setting where each \mathbf{x}_i and y_i are known up to N bits of precision, the run-time's dependence on N is linear. We also show that our sample complexities are optimal in terms of d^n. Furthermore, we show that it is possible to have the run-time be independent of 1/\sigma, at the cost of a higher sample complexity. 
                        more » 
                        « less   
                    
                            
                            Fast Regression for Structured Inputs
                        
                    
    
            We study the $$\ell_p$$ regression problem, which requires finding $$\mathbf{x}\in\mathbb R^{d}$$ that minimizes $$\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_p$$ for a matrix $$\mathbf{A}\in\mathbb R^{n \times d}$$ and response vector $$\mathbf{b}\in\mathbb R^{n}$$. There has been recent interest in developing subsampling methods for this problem that can outperform standard techniques when $$n$$ is very large. However, all known subsampling approaches have run time that depends exponentially on $$p$$, typically, $$d^{\mathcal{O}(p)}$$, which can be prohibitively expensive. We improve on this work by showing that for a large class of common \emph{structured matrices}, such as combinations of low-rank matrices, sparse matrices, and Vandermonde matrices, there are subsampling based methods for $$\ell_p$$ regression that depend polynomially on $$p$$. For example, we give an algorithm for $$\ell_p$$ regression on Vandermonde matrices that runs in time $$\mathcal{O}(n\log^3 n+(dp^2)^{0.5+\omega}\cdot\text{polylog}\,n)$$, where $$\omega$$ is the exponent of matrix multiplication. The polynomial dependence on $$p$$ crucially allows our algorithms to extend naturally to efficient algorithms for $$\ell_\infty$$ regression, via approximation of $$\ell_\infty$$ by $$\ell_{\mathcal{O}(\log n)}$$. Of practical interest, we also develop a new subsampling algorithm for $$\ell_p$$ regression for arbitrary matrices, which is simpler than previous approaches for $$p \ge 4$$. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2046235
- PAR ID:
- 10326695
- Date Published:
- Journal Name:
- International Conference on Learning Representations (ICLR)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. \begin{itemize} \item \textbf{Diagonal preconditioning.} We give an algorithm which, given positive definite $$\mathbf{K} \in \mathbb{R}^{d \times d}$$ with $$\mathrm{nnz}(\mathbf{K})$$ nonzero entries, computes an $$\epsilon$$-optimal diagonal preconditioner in time $$\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1}))$$, where $$\kappa^\star$$ is the optimal condition number of the rescaled matrix. \item \textbf{Structured linear systems.} We give an algorithm which, given $$\mathbf{M} \in \mathbb{R}^{d \times d}$$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $$\mathbf{M}$$ in $$\widetilde{O}(d^2)$$ time. \end{itemize} Our diagonal preconditioning results improve state-of-the-art runtimes of $$\Omega(d^{3.5})$$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $$\Omega(d^{\omega})$$ where $$\omega > 2.3$$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call \emph{matrix-dictionary approximation SDPs}, which we leverage to solve an associated problem we call \emph{matrix-dictionary recovery}.more » « less
- 
            An \ell _p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A , \[ \Pr _{\Pi }[\textrm {for all }x, \ \Vert Ax\Vert _p \le \Vert \Pi Ax\Vert _p \le \kappa \Vert Ax\Vert _p] \ge 9/10,\] where r is the dimension of the embedding, \kappa is the distortion of the embedding, and for an n -dimensional vector y , \Vert y\Vert _p = (\sum _{i=1}^n |y_i|^p)^{1/p} is the \ell _p -norm. Another important property is the sparsity of \Pi , that is, the maximum number of non-zero entries per column, as this determines the running time of computing \Pi A . While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 \le p \lt 2 , much less was known. In this article, we obtain nearly optimal tradeoffs for \ell _1 oblivious subspace embeddings, as well as new tradeoffs for 1 \lt p \lt 2 . Our main results are as follows: (1) We show for every 1 \le p \lt 2 , any oblivious subspace embedding with dimension r has distortion \[ \kappa = \Omega \left(\frac{1}{\left(\frac{1}{d}\right)^{1 / p} \log ^{2 / p}r + \left(\frac{r}{n}\right)^{1 / p - 1 / 2}}\right).\] When r = {\operatorname{poly}}(d) \ll n in applications, this gives a \kappa = \Omega (d^{1/p}\log ^{-2/p} d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 is optimal up to {\operatorname{poly}}(\log (d)) factors. (2) We give sparse oblivious subspace embeddings for every 1 \le p \lt 2 . Importantly, for p = 1 , we achieve r = O(d \log d) , \kappa = O(d \log d) and s = O(\log d) non-zero entries per column. The best previous construction with s \le {\operatorname{poly}}(\log d) is due to Woodruff and Zhang (COLT, 2013), giving \kappa = \Omega (d^2 {\operatorname{poly}}(\log d)) or \kappa = \Omega (d^{3/2} \sqrt {\log n} \cdot {\operatorname{poly}}(\log d)) and r \ge d \cdot {\operatorname{poly}}(\log d) ; in contrast our r = O(d \log d) and \kappa = O(d \log d) are optimal up to {\operatorname{poly}}(\log (d)) factors even for dense matrices. We also give (1) \ell _p oblivious subspace embeddings with an expected 1+\varepsilon number of non-zero entries per column for arbitrarily small \varepsilon \gt 0 , and (2) the first oblivious subspace embeddings for 1 \le p \lt 2 with O(1) -distortion and dimension independent of n . Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise \ell _p low-rank approximation. Our results give improved algorithms for these applications.more » « less
- 
            null (Ed.)Abstract We show how to construct a $$(1+\varepsilon )$$ ( 1 + ε ) -spanner over a set $${P}$$ P of n points in $${\mathbb {R}}^d$$ R d that is resilient to a catastrophic failure of nodes. Specifically, for prescribed parameters $${\vartheta },\varepsilon \in (0,1)$$ ϑ , ε ∈ ( 0 , 1 ) , the computed spanner $${G}$$ G has $$\begin{aligned} {{\mathcal {O}}}\bigl (\varepsilon ^{-O(d)} {\vartheta }^{-6} n(\log \log n)^6 \log n \bigr ) \end{aligned}$$ O ( ε - O ( d ) ϑ - 6 n ( log log n ) 6 log n ) edges. Furthermore, for any k , and any deleted set $${{B}}\subseteq {P}$$ B ⊆ P of k points, the residual graph $${G}\setminus {{B}}$$ G \ B is a $$(1+\varepsilon )$$ ( 1 + ε ) -spanner for all the points of $${P}$$ P except for $$(1+{\vartheta })k$$ ( 1 + ϑ ) k of them. No previous constructions, beyond the trivial clique with $${{\mathcal {O}}}(n^2)$$ O ( n 2 ) edges, were known with this resilience property (i.e., only a tiny additional fraction of vertices, $$\vartheta |B|$$ ϑ | B | , lose their distance preserving connectivity). Our construction works by first solving the exact problem in one dimension, and then showing a surprisingly simple and elegant construction in higher dimensions, that uses the one-dimensional construction in a black-box fashion.more » « less
- 
            Abstract We present efficient algorithms for counting points on a smooth plane quartic curve X modulo a prime p . We address both the case where X is defined over $${\mathbb {F}}_p$$ F p and the case where X is defined over $${\mathbb {Q}}$$ Q and p is a prime of good reduction. We consider two approaches for computing $$\#X({\mathbb {F}}_p)$$ # X ( F p ) , one which runs in $$O(p\log p\log \log p)$$ O ( p log p log log p ) time using $$O(\log p)$$ O ( log p ) space and one which runs in $$O(p^{1/2}\log ^2p)$$ O ( p 1 / 2 log 2 p ) time using $$O(p^{1/2}\log p)$$ O ( p 1 / 2 log p ) space. Both approaches yield algorithms that are faster in practice than existing methods. We also present average polynomial-time algorithms for $$X/{\mathbb {Q}}$$ X / Q that compute $$\#X({\mathbb {F}}_p)$$ # X ( F p ) for good primes $$p\leqslant N$$ p ⩽ N in $$O(N\log ^3 N)$$ O ( N log 3 N ) time using O ( N ) space. These are the first practical implementations of average polynomial-time algorithms for curves that are not cyclic covers of $${\mathbb {P}}^1$$ P 1 , which in combination with previous results addresses all curves of genus $$g\leqslant 3$$ g ⩽ 3 . Our algorithms also compute Cartier–Manin/Hasse–Witt matrices that may be of independent interest.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    