skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) Cooley-Tukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed and 40X fewer parameters.  more » « less
Award ID(s):
1763481
PAR ID:
10137134
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
97
ISSN:
2640-3498
Page Range / eLocation ID:
1517-1527
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an $$n \times n$$ linear system $Ax=b$ is $$\tilde{O}(n^\omega)$$, where $$\omega < 2.372864$$ is the matrix multiplication exponent. Improving on this has been an open problem even for sparse linear systems with poly$(n)$ condition number. In this paper, we present an algorithm that solves linear systems in sparse matrices asymptotically faster than matrix multiplication for any $$\omega > 2$$. This speedup holds for any input matrix $$A$$ with $$o(n^{\omega -1}/\log(\kappa(A)))$$ non-zeros, where $$\kappa(A)$$ is the condition number of $$A$$. For poly$(n)$-conditioned matrices with $$\tilde{O}(n)$$ nonzeros, and the current value of $$\omega$$, the bit complexity of our algorithm to solve to within any $$1/\text{poly}(n)$$ error is $$O(n^{2.331645})$$. Our algorithm can be viewed as an efficient, randomized implementation of the block Krylov method via recursive low displacement rank factorizations. It is inspired by the algorithm of [Eberly et al. ISSAC `06 `07] for inverting matrices over finite fields. In our analysis of numerical stability, we develop matrix anti-concentration techniques to bound the smallest eigenvalue and the smallest gap in eigenvalues of semi-random matrices. 
    more » « less
  2. We prove lower bounds on the time and space required for quantum computers to solve a wide variety of problems involving matrices, many of which have only been analyzed classically in prior work. Using a novel way of applying recording query methods we show that for many linear algebra problems—including matrix-vector product, matrix inversion, matrix multiplication and powering—existing classical time-space tradeoffs also apply to quantum algorithms with at most a constant factor loss. For example, for almost all fixed matrices A, including the discrete Fourier transform (DFT) matrix, we prove that quantum circuits with at most T input queries and S qubits of memory require T=Ω(n^2/S) to compute matrix-vector product Ax for x ∈ {0,1}^n. We similarly prove that matrix multiplication for nxn binary matrices requires T=Ω(n^3/√S). Because many of our lower bounds are matched by deterministic algorithms with the same time and space complexity, our results show that quantum computers cannot provide any asymptotic advantage for these problems at any space bound. We also improve the previous quantum time-space tradeoff lower bounds for n× n Boolean (i.e. AND-OR) matrix multiplication from T=Ω(n^2.5/S^0.5) to T=Ω(n^2.5/S^0.25) which has optimal exponents for the powerful query algorithms to which it applies. Our method also yields improved lower bounds for classical algorithms. 
    more » « less
  3. A direct solver is introduced for solving overdetermined linear systems involving nonuniform discrete Fourier transform matrices. Such matrices can be transformed into a Cauchy-like form that has hierarchical low rank structure. The rank structure of this matrix is explained, and it is shown that the ranks of the relevant submatrices grow only logarithmically with the number of columns of the matrix. A fast rank-structured hierarchical approximation method based on this analysis is developed, along with a hierarchical least-squares solver for these and related systems. This result is a direct method for inverting nonuniform discrete transforms with a complexity that is usually nearly linear with respect to the degrees of freedom in the problem. This solver is benchmarked against various iterative and direct solvers in the setting of inverting the one-dimensional type-II (or forward) transform, for a range of condition numbers and problem sizes (up to 4 × 10 by 2 × 10 ). These experiments demonstrate that this method is especially useful for large problems with multiple right-hand sides. 
    more » « less
  4. We develop effective approximation methods for unitary matrices. In our formulation, a unitary matrix is represented as a product of rotations in two-dimensional subspaces, so-called Givens rotations. Instead of the quadratic dimension dependence when applying a dense matrix, applying such an approximation scales with the number factors, each of which can be implemented efficiently. Consequently, in settings where an approximation is once computed and then applied many times, such an effective representation becomes advantageous. Although efficient Givens factorizations are not possible for generic unitary operators, we show that minimizing a sparsity-inducing objective with a coordinate descent algorithm on the unitary group yields good factorizations for structured matrices. Canonical applications of such a setup are orthogonal basis transforms. We demonstrate that our methods improve the approximate representation of the graph Fourier transform, the matrix obtained when diagonalizing a graph Laplacian. 
    more » « less
  5. We develop effective approximation methods for unitary matrices. In our formulation, a unitary matrix is represented as a product of rotations in two-dimensional subspaces, so-called Givens rotations. Instead of the quadratic dimension dependence when applying a dense matrix, applying such an approximation scales with the number factors, each of which can be implemented efficiently. Consequently, in settings where an approximation is once computed and then applied many times, such an effective representation becomes advantageous. Although efficient Givens factorizations are not possible for generic unitary operators, we show that minimizing a sparsity-inducing objective with a coordinate descent algorithm on the unitary group yields good factorizations for structured matrices. Canonical applications of such a setup are orthogonal basis transforms. We demonstrate that our methods improve the approximate representation of the graph Fourier transform, the matrix obtained when diagonalizing a graph Laplacian. 
    more » « less