skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient Block Approximate Matrix Multiplication
Randomized matrix algorithms have had significant recent impact on numerical linear algebra. One especially powerful class of methods are algorithms for approximate matrix multiplication based on sampling. Such methods typically sample individual matrix rows and columns using carefully chosen importance sampling probabilities. However, due to practical considerations like memory locality and the preservation of matrix structure, it is often preferable to sample contiguous blocks of rows and columns all together. Recently, (Wu, 2018) addressed this setting by developing an approximate matrix multiplication method based on block sampling. However, the method is inefficient, as it requires knowledge of optimal importance sampling probabilities that are expensive to compute. We address this issue by showing that the method of Wu can be accelerated through the use of a randomized implicit trace estimation method. Doing so allows us to provably reduce the cost of sampling to near-linear in the size of the matrices being multiplied, without impacting the accuracy of the final approximate matrix multiplication. Overall, this yields a fast practical algorithm, which we test on a number of synthetic and real-world data sets. We complement our algorithmic contribution with the first extensive empirical comparison of block algorithms for randomized matrix multiplication. Our method offers a significant runtime advantage over the method of (Wu, 2018) and also outperforms basic uniform sampling of blocks. However, we find another recent method of (Charalambides, 2021) which uses sub-optimal but efficiently computable sampling probabilities often (but not always) offers the best trade-off between speed and accuracy.  more » « less
Award ID(s):
2045590
PAR ID:
10495905
Author(s) / Creator(s):
;
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Journal Name:
31st Annual European Symposium on Algorithms
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matrix multiplication, i.e., allowing for a tolerable error is also sufficient. Such approximate schemes make use of randomization techniques to speed up the computation process. In this paper, we initiate the study of approximate coded matrix multiplication, and investigate the joint synergies offered by randomization and coding. Specifically, we propose two coded randomized sampling schemes that use (a) codes to achieve a desired recovery threshold and (b) random sampling to obtain approximation of the matrix multiplication. Tradeoffs between the recovery threshold and approximation error obtained through random sampling are investigated for a class of coded matrix multiplication schemes. 
    more » « less
  2. Summary We establish a general theory of optimality for block bootstrap distribution estimation for sample quantiles under mild strong mixing conditions. In contrast to existing results, we study the block bootstrap for varying numbers of blocks. This corresponds to a hybrid between the sub- sampling bootstrap and the moving block bootstrap, in which the number of blocks is between 1 and the ratio of sample size to block length. The hybrid block bootstrap is shown to give theoretical benefits, and startling improvements in accuracy in distribution estimation in important practical settings. The conclusion that bootstrap samples should be of smaller size than the original sample has significant implications for computational efficiency and scalability of bootstrap methodologies with dependent data. Our main theorem determines the optimal number of blocks and block length to achieve the best possible convergence rate for the block bootstrap distribution estimator for sample quantiles. We propose an intuitive method for empirical selection of the optimal number and length of blocks, and demonstrate its value in a nontrivial example. 
    more » « less
  3. The main contribution of this paper is a new improved variant of the laser method for designing matrix multiplication algorithms. Building upon the recent techniques of [Duan, Wu, Zhou, FOCS 2023], the new method introduces several new ingredients that not only yield an improved bound on the matrix multiplication exponent ω, but also improve the known bounds on rectangular matrix multiplication by [Le Gall and Urrutia, SODA 2018]. In particular, the new bound on ω is ω ≤ 2.371552 (improved from ω ≤ 2.371866). For the dual matrix multiplication exponent α defined as the largest α for which ω(1, α, 1) = 2, we obtain the improvement α ≥ 0.321334 (improved from α ≥ 0.31389). Similar improvements are obtained for various other exponents for multiplying rectangular matrices. 
    more » « less
  4. Gradient coding is a method for mitigating straggling servers in a centralized computing network that uses erasure-coding techniques to distributively carry out first-order optimization methods. Randomized numerical linear algebra uses randomization to develop improved algorithms for large-scale linear algebra computations. In this paper, we propose a method for distributed optimization that combines gradient coding and randomized numerical linear algebra. The proposed method uses a randomized ℓ2 -subspace embedding and a gradient coding technique to distribute blocks of data to the computational nodes of a centralized network, and at each iteration the central server only requires a small number of computations to obtain the steepest descent update. The novelty of our approach is that the data is replicated according to importance scores, called block leverage scores, in contrast to most gradient coding approaches that uniformly replicate the data blocks. Furthermore, we do not require a decoding step at each iteration, avoiding a bottleneck in previous gradient coding schemes. We show that our approach results in a valid ℓ2 -subspace embedding, and that our resulting approximation converges to the optimal solution. 
    more » « less
  5. This paper expands the analysis of randomized low-rank approximation beyond the Gaussian distribution to four classes of random matrices: (1) independent sub-Gaussian entries, (2) independent sub-Gaussian columns, (3) independent bounded columns, and (4) independent columns with bounded second moment. Using a novel interpretation of the low-rank approximation error involving sample covariance matrices, we provide insight into the requirements of a good random matrix for randomized low-rank approximations. Although our bounds involve unspecified absolute constants (a consequence of underlying nonasymptotic theory of random matrices), they allow for qualitative comparisons across distributions. The analysis offers some details on the minimal number of samples (the number of columns of the random matrix ) and the error in the resulting low-rank approximation. We illustrate our analysis in the context of the randomized subspace iteration method as a representative algorithm for low-rank approximation; however, all the results are broadly applicable to other low-rank approximation techniques. We conclude our discussion with numerical examples using both synthetic and real-world test matrices. 
    more » « less