We study the bit complexity of two related fundamental computational problems in linear algebra and control theory. Our results are: (1) An Õ(n^{ω+3}a+n⁴a²+n^ωlog(1/ε)) time algorithm for finding an ε-approximation to the Jordan Normal form of an integer matrix with a-bit entries, where ω is the exponent of matrix multiplication. (2) An Õ(n⁶d⁶a+n⁴d⁴a²+n³d³log(1/ε)) time algorithm for ε-approximately computing the spectral factorization P(x) = Q^*(x)Q(x) of a given monic n× n rational matrix polynomial of degree 2d with rational a-bit coefficients having a-bit common denominators, which satisfies P(x)⪰0 for all real x. The first algorithm is used as a subroutine in the second one. Despite its being of central importance, polynomial complexity bounds were not previously known for spectral factorization, and for Jordan form the best previous best running time was an unspecified polynomial in n of degree at least twelve [Cai, 1994]. Our algorithms are simple and judiciously combine techniques from numerical and symbolic computation, yielding significant advantages over either approach by itself.
more »
« less
Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow
We revisit the inductive matrix completion problem that aims to recover a rank-r matrix with ambient dimension d given n features as the side prior information. The goal is to make use of the known n features to reduce sample and computational complexities. We present and analyze a new gradient-based non-convex optimization algorithm that converges to the true underlying matrix at a linear rate with sample complexity only linearly depending on n and logarithmically depending on d. To the best of our knowledge, all previous algorithms either have a quadratic dependency on the number of features in sample complexity or a sub-linear computational convergence rate. In addition, we provide experiments on both synthetic and real world data to demonstrate the effectiveness of our proposed algorithm.
more »
« less
- PAR ID:
- 10063546
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension d, m tasks with k features up to error ϵ is O~(dk1.5/ϵ+km/ϵ). We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods.more » « less
-
null (Ed.)Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an $$n \times n$$ linear system $Ax=b$ is $$\tilde{O}(n^\omega)$$, where $$\omega < 2.372864$$ is the matrix multiplication exponent. Improving on this has been an open problem even for sparse linear systems with poly$(n)$ condition number. In this paper, we present an algorithm that solves linear systems in sparse matrices asymptotically faster than matrix multiplication for any $$\omega > 2$$. This speedup holds for any input matrix $$A$$ with $$o(n^{\omega -1}/\log(\kappa(A)))$$ non-zeros, where $$\kappa(A)$$ is the condition number of $$A$$. For poly$(n)$-conditioned matrices with $$\tilde{O}(n)$$ nonzeros, and the current value of $$\omega$$, the bit complexity of our algorithm to solve to within any $$1/\text{poly}(n)$$ error is $$O(n^{2.331645})$$. Our algorithm can be viewed as an efficient, randomized implementation of the block Krylov method via recursive low displacement rank factorizations. It is inspired by the algorithm of [Eberly et al. ISSAC `06 `07] for inverting matrices over finite fields. In our analysis of numerical stability, we develop matrix anti-concentration techniques to bound the smallest eigenvalue and the smallest gap in eigenvalues of semi-random matrices.more » « less
-
We study the problem of robust multivariate polynomial regression: let p\colon\mathbb{R}^n\to\mathbb{R} be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R} that are noisy versions of (\mathbf{x}_i,p(\mathbf{x}_i)). More precisely, each \mathbf{x}_i is sampled independently from some distribution \chi on [-1,1]^n, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most \rho < 1/2, and otherwise satisfies |y_i-p(\mathbf{x}_i)|\leq\sigma. The goal is to output a polynomial \hat{p}, of degree at most d in each variable, within an \ell_\infty-distance of at most O(\sigma) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n=1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(d^n\log d), where the hidden constant depends on n, if \chi is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}\log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(\sigma), and the run-time depends on \log(1/\sigma). In the setting where each \mathbf{x}_i and y_i are known up to N bits of precision, the run-time's dependence on N is linear. We also show that our sample complexities are optimal in terms of d^n. Furthermore, we show that it is possible to have the run-time be independent of 1/\sigma, at the cost of a higher sample complexity.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)We study the problem of robust multivariate polynomial regression: let p: ℝⁿ → ℝ be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (𝐱_i,y_i) ∈ [-1,1]ⁿ × ℝ that are noisy versions of (𝐱_i,p(𝐱_i)). More precisely, each 𝐱_i is sampled independently from some distribution χ on [-1,1]ⁿ, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most ρ < 1/2, and otherwise satisfies |y_i-p(𝐱_i)| ≤ σ. The goal is to output a polynomial p̂, of degree at most d in each variable, within an 𝓁_∞-distance of at most O(σ) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n = 1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(dⁿlog d), where the hidden constant depends on n, if χ is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(σ), and the run-time depends on log(1/σ). In the setting where each 𝐱_i and y_i are known up to N bits of precision, the run-time’s dependence on N is linear. We also show that our sample complexities are optimal in terms of dⁿ. Furthermore, we show that it is possible to have the run-time be independent of 1/σ, at the cost of a higher sample complexity.more » « less
An official website of the United States government

