Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Krylov subspace methods are a ubiquitous tool for computing nearoptimal rank kk approximations of large matrices. While "large block" Krylov methods with block size at least kk give the best known theoretical guarantees, block size one (a single vector) or a small constant is often preferred in practice. Despite their popularity, we lack theoretical bounds on the performance of such "small block" Krylov methods for lowrank approximation. We address this gap between theory and practice by proving that small block Krylov methods essentially match all known lowrank approximation guarantees for large block methods. Via a blackbox reduction we show, for example, that the standard single vector Krylov method run for t iterations obtains the same spectral norm and Frobenius norm error bounds as a Krylov method with block size ℓ≥kℓ≥k run for O(t/ℓ)O(t/ℓ) iterations, up to a logarithmic dependence on the smallest gap between sequential singular values. That is, for a given number of matrixvector products, single vector methods are essentially as effective as any choice of large block size. By combining our result with tailbounds on eigenvalue gaps in random matrices, we prove that the dependence on the smallest singular value gap can be eliminated if the input matrix is perturbed by a small random matrix. Further, we show that single vector methods match the more complex algorithm of [Bakshi et al. `22], which combines the results of multiple block sizes to achieve an improved algorithm for Schatten ppnorm lowrank approximation.more » « lessFree, publiclyaccessible full text available January 7, 2025

We develop a general framework for finding approximatelyoptimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. \begin{itemize} \item \textbf{Diagonal preconditioning.} We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $\epsilon$optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{1}))$, where $\kappa^\star$ is the optimal condition number of the rescaled matrix. \item \textbf{Structured linear systems.} We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. \end{itemize} Our diagonal preconditioning results improve stateoftheart runtimes of $\Omega(d^{3.5})$ attained by generalpurpose semidefinite programming, and our solvers improve stateoftheart runtimes of $\Omega(d^{\omega})$ where $\omega > 2.3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call \emph{matrixdictionary approximation SDPs}, which we leverage to solve an associated problem we call \emph{matrixdictionary recovery}.more » « lessFree, publiclyaccessible full text available December 10, 2024

The kernel polynomial method (KPM) is a powerful numerical method for approximating spectral densities. Typical implementations of the KPM require an a prior estimate for an interval containing the support of the target spectral density, and while such estimates can be obtained by classical techniques, this incurs addition computational costs. We propose a spectrum adaptive KPM based on the Lanczos algorithm without reorthogonalization, which allows the selection of KPM parameters to be deferred to after the expensive computation is finished. Theoretical results from numerical analysis are given to justify the suitability of the Lanczos algorithm for our approach, even in finite precision arithmetic. While conceptually simple, the paradigm of decoupling computation from approximation has a number of practical and pedagogical benefits, which we highlight with numerical examples.

Randomized matrix algorithms have had significant recent impact on numerical linear algebra. One especially powerful class of methods are algorithms for approximate matrix multiplication based on sampling. Such methods typically sample individual matrix rows and columns using carefully chosen importance sampling probabilities. However, due to practical considerations like memory locality and the preservation of matrix structure, it is often preferable to sample contiguous blocks of rows and columns all together. Recently, (Wu, 2018) addressed this setting by developing an approximate matrix multiplication method based on block sampling. However, the method is inefficient, as it requires knowledge of optimal importance sampling probabilities that are expensive to compute. We address this issue by showing that the method of Wu can be accelerated through the use of a randomized implicit trace estimation method. Doing so allows us to provably reduce the cost of sampling to nearlinear in the size of the matrices being multiplied, without impacting the accuracy of the final approximate matrix multiplication. Overall, this yields a fast practical algorithm, which we test on a number of synthetic and realworld data sets. We complement our algorithmic contribution with the first extensive empirical comparison of block algorithms for randomized matrix multiplication. Our method offers a significant runtime advantage over the method of (Wu, 2018) and also outperforms basic uniform sampling of blocks. However, we find another recent method of (Charalambides, 2021) which uses suboptimal but efficiently computable sampling probabilities often (but not always) offers the best tradeoff between speed and accuracy.more » « less

We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on $[1,1]$ that cannot be approximated to accuracy $\epsilon$ in Wasserstein1 distance even if we know \emph{all} of their moments to multiplicative accuracy $(1\pm2^{\Omega(1/\epsilon)})$; this result matches an upper bound of Kong and Valiant [Annals of Statistics, 2017]. To obtain our result, we provide a hard instance involving distributions induced by the eigenvalue spectra of carefully constructed graph adjacency matrices. Efficiently approximating such spectra in Wasserstein1 distance is a wellstudied algorithmic problem, and a recent result of CohenSteiner et al. [KDD 2018] gives a method based on accurately approximating spectral moments using $2^{O(1/\epsilon)}$ random walks initiated at uniformly random nodes in the graph.As a strengthening of our main result, we show that improving the dependence on $1/\epsilon$ in this result would require a new algorithmic approach. Specifically, no algorithm can compute an $\epsilon$accurate approximation to the spectrum of a normalized graph adjacency matrix with constant probability, even when given the transcript of $2^{\Omega(1/\epsilon)}$ random walks of length $2^{\Omega(1/\epsilon)}$ started at random nodes.more » « less

Abstract. We describe a Lanczosbased algorithm for approximating the product of a rational matrix function with a vector. This algorithm, which we call the Lanczos method for optimal rational matrix function approximation (LanczosOR), returns the optimal approximation from a given Krylov subspace in a norm depending on the rational function’s denominator, and it can be computed using the information from a slightly larger Krylov subspace. We also provide a lowmemory implementation which only requires storing a number of vectors proportional to the denominator degree of the rational function. Finally, we show that LanczosOR can be used to derive algorithms for computing other matrix functions, including the matrix sign function and quadraturebased rational function approximations. In many cases, it improves on the approximation quality of prior approaches, including the standard Lanczos method, with little additional computational overhead.more » « less

Finding the mode of a high dimensional probability distribution $\mathcal{D}$ is a fundamental algorithmic problem in statistics and data analysis. There has been particular interest in efficient methods for solving the problem when $\mathcal{D}$ is represented as a mixture model or kernel density estimate, although few algorithmic results with worstcase approximation and runtime guarantees are known. In this work, we significantly generalize a result of (LeeLiMusco:2021) on mode approximation for Gaussian mixture models. We develop randomized dimensionality reduction methods for mixtures involving a broader class of kernels, including the popular logistic, sigmoid, and generalized Gaussian kernels. As in Lee et al.’s work, our dimensionality reduction results yield quasipolynomial algorithms for mode finding with multiplicative accuracy $(1\epsilon)$ for any $\epsilon > 0$. Moreover, when combined with gradient descent, they yield efficient practical heuristics for the problem. In addition to our positive results, we prove a hardness result for box kernels, showing that there is no polynomial time algorithm for finding the mode of a kernel density estimate, unless $\mathit{P} = \mathit{NP}$. Obtaining similar hardness results for kernels used in practice (like Gaussian or logistic kernels) is an interesting future direction.more » « less

We consider the problem of active learning for single neuron models, also sometimes called “ridge functions”, in the agnostic setting (under adversarial label noise). Such models have been shown to be broadly effective in modeling physical phenomena, and for constructing surrogate datadriven models for partial differential equations. Surprisingly, we show that for a single neuron model with any Lipschitz nonlinearity (such as the ReLU, sigmoid, absolute value, lowdegree polynomial, among others), strong provable approximation guarantees can be obtained using a wellknown active learning strategy for fitting linear functions in the agnostic setting. Namely, we can collect samples via statistical leverage score sampling, which has been shown to be nearoptimal in other active learning scenarios. We support our theoretical results with empirical simulations showing that our proposed active learning strategy based on leverage score sampling outperforms (ordinary) uniform sampling when fitting single neuron models.more » « less