In this paper, we focus on the parallel communication cost of multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK). SYRK requires half the computation of general matrix multiplication because of the symmetry of the output matrix. Recent work (Beaumont et al., SPAA '22) has demonstrated that the sequential I/O complexity of SYRK is also a constant factor smaller than that of general matrix multiplication. Inspired by this progress, we establish memory-independent parallel communication lower bounds for SYRK with smaller constants than general matrix multiplication, and we show that these constants are tight by presenting communication-optimal algorithms. The crux of the lower bound proof relies on extending a key geometric inequality to symmetric computations and analytically solving a constrained nonlinear optimization problem. The optimal algorithms use a triangular blocking scheme for parallel distribution of the symmetric output matrix and corresponding computation.
more »
« less
Stochastic dynamics of mechanical systems with impacts via the Step Matrix multiplication based Path Integration method
Abstract In this work we propose the Step Matrix Multiplication based Path Integration method (SMM-PI) for nonlinear vibro-impact oscillator systems. This method allows the efficient and accurate deterministic computation of the time-dependent response probability density function by transforming the corresponding Chapman–Kolmogorov equation to a matrix–vector multiplication using high-order numerical time-stepping and interpolation methods. Additionally, the SMM-PI approach yields the computation of the joint probability distribution for response and impact velocity, as well as the time between impacts and other important characteristics. The method is applied to a nonlinear oscillator with a pair of impact barriers, and to a linear oscillator with a single barrier, providing relevant densities and analysing energy accumulation and absorption properties. We validate the results with the help of stochastic Monte-Carlo simulations and show the superior ability of the introduced formulation to compute accurate response statistics.
more »
« less
- Award ID(s):
- 2009270
- PAR ID:
- 10500881
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Nonlinear Dynamics
- Volume:
- 112
- Issue:
- 11
- ISSN:
- 0924-090X
- Format(s):
- Medium: X Size: p. 9095-9116
- Size(s):
- p. 9095-9116
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract SummaryMultiple sequence alignment is an important problem in computational biology with applications that include phylogeny and the detection of remote homology between protein sequences. UPP is a popular software package that constructs accurate multiple sequence alignments for large datasets based on ensembles of hidden Markov models (HMMs). A computational bottleneck for this method is a sequence-to-HMM assignment step, which relies on the precise computation of probability scores on the HMMs. In this work, we show that we can speed up this assignment step significantly by replacing these HMM probability scores with alternative scores that can be efficiently estimated. Our proposed approach utilizes a multi-armed bandit algorithm to adaptively and efficiently compute estimates of these scores. This allows us to achieve similar alignment accuracy as UPP with a significant reduction in computation time, particularly for datasets with long sequences. Availability and implementationThe code used to produce the results in this paper is available on GitHub at: https://github.com/ilanshom/adaptiveMSA.more » « less
-
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matrix multiplication, i.e., allowing for a tolerable error is also sufficient. Such approximate schemes make use of randomization techniques to speed up the computation process. In this paper, we initiate the study of approximate coded matrix multiplication, and investigate the joint synergies offered by randomization and coding. Specifically, we propose two coded randomized sampling schemes that use (a) codes to achieve a desired recovery threshold and (b) random sampling to obtain approximation of the matrix multiplication. Tradeoffs between the recovery threshold and approximation error obtained through random sampling are investigated for a class of coded matrix multiplication schemes.more » « less
-
We consider the time and space required for quantum computers to solve a wide variety of problems involving matrices, many of which have only been analyzed classically in prior work. Our main results show that for a range of linear algebra problems—including matrix-vector product, matrix inversion, matrix multiplication and powering—existing classical time-space tradeoffs, several of which are tight for every space bound, also apply to quantum algorithms with at most a constant factor loss. For example, for almost all fixed matrices A, including the discrete Fourier transform (DFT) matrix, we prove that quantum circuits with at most T input queries and S qubits of memory require T = Ω(n^2/S) to compute matrix-vector product Ax for x ∈ {0, 1}^n. We similarly prove that matrix multiplication for n × n binary matrices requires T = Ω(n^3/√S). Because many of our lower bounds are matched by deterministic algorithms with the same time and space complexity, our results show that quantum computers cannot provide any asymptotic advantage for these problems with any space bound. We obtain matching lower bounds for the stronger notion of quantum cumulative memory complexity—the sum of the space per layer of a circuit. We also consider Boolean (i.e. AND-OR) matrix multiplication and matrix-vector products, improving the previous quantum time-space tradeoff lower bounds for n × n Boolean matrix multiplication to T = Ω(n^{2.5}/S^{1/4}) from T = Ω(n^{2.5}/S^{1/2}). Our improved lower bound for Boolean matrix multiplication is based on a new coloring argument that extracts more from the strong direct product theorem that was the basis for prior work. To obtain our tight lower bounds for linear algebra problems, we require much stronger bounds than strong direct product theorems. We obtain these bounds by adding a new bucketing method to the quantum recording-query technique of Zhandry that lets us apply classical arguments to upper bound the success probability of quantum circuits.more » « less
-
Randomized matrix algorithms have had significant recent impact on numerical linear algebra. One especially powerful class of methods are algorithms for approximate matrix multiplication based on sampling. Such methods typically sample individual matrix rows and columns using carefully chosen importance sampling probabilities. However, due to practical considerations like memory locality and the preservation of matrix structure, it is often preferable to sample contiguous blocks of rows and columns all together. Recently, (Wu, 2018) addressed this setting by developing an approximate matrix multiplication method based on block sampling. However, the method is inefficient, as it requires knowledge of optimal importance sampling probabilities that are expensive to compute. We address this issue by showing that the method of Wu can be accelerated through the use of a randomized implicit trace estimation method. Doing so allows us to provably reduce the cost of sampling to near-linear in the size of the matrices being multiplied, without impacting the accuracy of the final approximate matrix multiplication. Overall, this yields a fast practical algorithm, which we test on a number of synthetic and real-world data sets. We complement our algorithmic contribution with the first extensive empirical comparison of block algorithms for randomized matrix multiplication. Our method offers a significant runtime advantage over the method of (Wu, 2018) and also outperforms basic uniform sampling of blocks. However, we find another recent method of (Charalambides, 2021) which uses sub-optimal but efficiently computable sampling probabilities often (but not always) offers the best trade-off between speed and accuracy.more » « less
An official website of the United States government
