In this paper, we focus on the parallel communication cost of multiplying a matrix with its transpose, known as a symmetric rankk update (SYRK). SYRK requires half the computation of general matrix multiplication because of the symmetry of the output matrix. Recent work (Beaumont et al., SPAA '22) has demonstrated that the sequential I/O complexity of SYRK is also a constant factor smaller than that of general matrix multiplication. Inspired by this progress, we establish memoryindependent parallel communication lower bounds for SYRK with smaller constants than general matrix multiplication, and we show that these constants are tight by presenting communicationoptimal algorithms. The crux of the lower bound proof relies on extending a key geometric inequality to symmetric computations and analytically solving a constrained nonlinear optimization problem. The optimal algorithms use a triangular blocking scheme for parallel distribution of the symmetric output matrix and corresponding computation.
more »
« less
Brief Announcement: Tight MemoryIndependent Parallel Matrix Multiplication Communication Lower Bounds
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more efficient algorithms that match the leadingorder terms in the lower bounds exactly and improve practical performance. The main result of this work is the establishment of memoryindependent communication lower bounds with tight constants for parallel matrix multiplication. Our constants improve on previous work in each of three cases that depend on the relative sizes of the aspect ratios of the matrices.
more »
« less
 NSFPAR ID:
 10343107
 Date Published:
 Journal Name:
 Proceedings of the 34th Annual ACM Symposium on Parallelism in Algorithms and Architectures
 Page Range / eLocation ID:
 445 to 448
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We prove lower bounds on the time and space required for quantum computers to solve a wide variety of problems involving matrices, many of which have only been analyzed classically in prior work. Using a novel way of applying recording query methods we show that for many linear algebra problems—including matrixvector product, matrix inversion, matrix multiplication and powering—existing classical timespace tradeoffs also apply to quantum algorithms with at most a constant factor loss. For example, for almost all fixed matrices A, including the discrete Fourier transform (DFT) matrix, we prove that quantum circuits with at most T input queries and S qubits of memory require T=Ω(n^2/S) to compute matrixvector product Ax for x ∈ {0,1}^n. We similarly prove that matrix multiplication for nxn binary matrices requires T=Ω(n^3/√S). Because many of our lower bounds are matched by deterministic algorithms with the same time and space complexity, our results show that quantum computers cannot provide any asymptotic advantage for these problems at any space bound. We also improve the previous quantum timespace tradeoff lower bounds for n× n Boolean (i.e. ANDOR) matrix multiplication from T=Ω(n^2.5/S^0.5) to T=Ω(n^2.5/S^0.25) which has optimal exponents for the powerful query algorithms to which it applies. Our method also yields improved lower bounds for classical algorithms.more » « less

TaShma, Amnon (Ed.)Threeplayer Number On the Forehead communication may be thought of as a threeplayer Number In the Hand promise model, in which each player is given the inputs that are supposedly on the other two players' heads, and promised that they are consistent with the inputs of the other players. The set of all allowed inputs under this promise may be thought of as an order3 tensor. We surprisingly observe that this tensor is exactly the matrix multiplication tensor, which is widely studied in the design of fast matrix multiplication algorithms. Using this connection, we prove a number of results about both Number On the Forehead communication and matrix multiplication, each by using known results or techniques about the other. For example, we show how the Laser method, a key technique used to design the best matrix multiplication algorithms, can also be used to design communication protocols for a variety of problems. We also show how known lower bounds for Number On the Forehead communication can be used to bound properties of the matrix multiplication tensor such as its zeroing out subrank. Finally, we substantially generalize known methods based on slicerank for studying communication, and show how they directly relate to the matrix multiplication exponent ω.more » « less

The matricizedtensor times KhatriRao product (MTTKRP) computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.more » « less

Determining the asymptotic algebraic complexity of matrix multiplication, succinctly represented by the matrix multiplication exponent omega, is a central problem in algebraic complexity theory. The best upper bounds on omega, leading to the stateoftheart omega <= 2.37.., have been obtained via the laser method of Strassen and its generalization by Coppersmith and Winograd. Recent barrier results show limitations for these and related approaches to improve the upper bound on omega. We introduce a new and more general barrier, providing stronger limitations than in previous work. Concretely, we introduce the notion of "irreversibility" of a tensor and we prove (in some precise sense) that any approach that uses an irreversible tensor in an intermediate step (e.g., as a starting tensor in the laser method) cannot give omega = 2. In quantitative terms, we prove that the best upper bound achievable is lower bounded by two times the irreversibility of the intermediate tensor. The quantum functionals and Strassen support functionals give (so far, the best) lower bounds on irreversibility. We provide lower bounds on the irreversibility of key intermediate tensors, including the small and big Coppersmith  Winograd tensors, that improve limitations shown in previous work. Finally, we discuss barriers on the grouptheoretic approach in terms of "monomial" irreversibility.more » « less