In this paper, we focus on the parallel communication cost of multiplying a matrix with its transpose, known as a symmetric rankk update (SYRK). SYRK requires half the computation of general matrix multiplication because of the symmetry of the output matrix. Recent work (Beaumont et al., SPAA '22) has demonstrated that the sequential I/O complexity of SYRK is also a constant factor smaller than that of general matrix multiplication. Inspired by this progress, we establish memoryindependent parallel communication lower bounds for SYRK with smaller constants than general matrix multiplication, and we show that these constants are tight by presenting communicationoptimal algorithms. The crux of the lower bound proof relies on extending a key geometric inequality to symmetric computations and analytically solving a constrained nonlinear optimization problem. The optimal algorithms use a triangular blocking scheme for parallel distribution of the symmetric output matrix and corresponding computation.
more »
« less
Brief Announcement: Tight MemoryIndependent Parallel Matrix Multiplication Communication Lower Bounds
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more efficient algorithms that match the leadingorder terms in the lower bounds exactly and improve practical performance. The main result of this work is the establishment of memoryindependent communication lower bounds with tight constants for parallel matrix multiplication. Our constants improve on previous work in each of three cases that depend on the relative sizes of the aspect ratios of the matrices.
more »
« less
 NSFPAR ID:
 10343107
 Date Published:
 Journal Name:
 Proceedings of the 34th Annual ACM Symposium on Parallelism in Algorithms and Architectures
 Page Range / eLocation ID:
 445 to 448
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


TaShma, Amnon (Ed.)Threeplayer Number On the Forehead communication may be thought of as a threeplayer Number In the Hand promise model, in which each player is given the inputs that are supposedly on the other two players' heads, and promised that they are consistent with the inputs of the other players. The set of all allowed inputs under this promise may be thought of as an order3 tensor. We surprisingly observe that this tensor is exactly the matrix multiplication tensor, which is widely studied in the design of fast matrix multiplication algorithms. Using this connection, we prove a number of results about both Number On the Forehead communication and matrix multiplication, each by using known results or techniques about the other. For example, we show how the Laser method, a key technique used to design the best matrix multiplication algorithms, can also be used to design communication protocols for a variety of problems. We also show how known lower bounds for Number On the Forehead communication can be used to bound properties of the matrix multiplication tensor such as its zeroing out subrank. Finally, we substantially generalize known methods based on slicerank for studying communication, and show how they directly relate to the matrix multiplication exponent ω.more » « less

The matricizedtensor times KhatriRao product (MTTKRP) computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.more » « less

Determining the asymptotic algebraic complexity of matrix multiplication, succinctly represented by the matrix multiplication exponent omega, is a central problem in algebraic complexity theory. The best upper bounds on omega, leading to the stateoftheart omega <= 2.37.., have been obtained via the laser method of Strassen and its generalization by Coppersmith and Winograd. Recent barrier results show limitations for these and related approaches to improve the upper bound on omega. We introduce a new and more general barrier, providing stronger limitations than in previous work. Concretely, we introduce the notion of "irreversibility" of a tensor and we prove (in some precise sense) that any approach that uses an irreversible tensor in an intermediate step (e.g., as a starting tensor in the laser method) cannot give omega = 2. In quantitative terms, we prove that the best upper bound achievable is lower bounded by two times the irreversibility of the intermediate tensor. The quantum functionals and Strassen support functionals give (so far, the best) lower bounds on irreversibility. We provide lower bounds on the irreversibility of key intermediate tensors, including the small and big Coppersmith  Winograd tensors, that improve limitations shown in previous work. Finally, we discuss barriers on the grouptheoretic approach in terms of "monomial" irreversibility.more » « less

The main contribution of this paper is a new improved variant of the laser method for designing matrix multiplication algorithms. Building upon the recent techniques of [Duan, Wu, Zhou, FOCS 2023], the new method introduces several new ingredients that not only yield an improved bound on the matrix multiplication exponent ω, but also improve the known bounds on rectangular matrix multiplication by [Le Gall and Urrutia, SODA 2018]. In particular, the new bound on ω is ω ≤ 2.371552 (improved from ω ≤ 2.371866). For the dual matrix multiplication exponent α defined as the largest α for which ω(1, α, 1) = 2, we obtain the improvement α ≥ 0.321334 (improved from α ≥ 0.31389). Similar improvements are obtained for various other exponents for multiplying rectangular matrices.more » « less