The matricizedtensor times KhatriRao product (MTTKRP) computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.
more »
« less
Matrix Multiplication and Number On the Forehead Communication
Threeplayer Number On the Forehead communication may be thought of as a threeplayer Number In the Hand promise model, in which each player is given the inputs that are supposedly on the other two players' heads, and promised that they are consistent with the inputs of the other players. The set of all allowed inputs under this promise may be thought of as an order3 tensor. We surprisingly observe that this tensor is exactly the matrix multiplication tensor, which is widely studied in the design of fast matrix multiplication algorithms.
Using this connection, we prove a number of results about both Number On the Forehead communication and matrix multiplication, each by using known results or techniques about the other. For example, we show how the Laser method, a key technique used to design the best matrix multiplication algorithms, can also be used to design communication protocols for a variety of problems. We also show how known lower bounds for Number On the Forehead communication can be used to bound properties of the matrix multiplication tensor such as its zeroing out subrank. Finally, we substantially generalize known methods based on slicerank for studying communication, and show how they directly relate to the matrix multiplication exponent ω.
more »
« less
 Award ID(s):
 2238221
 NSFPAR ID:
 10488388
 Editor(s):
 TaShma, Amnon
 Publisher / Repository:
 Schloss Dagstuhl – LeibnizZentrum für Informatik
 Date Published:
 Journal Name:
 38th Computational Complexity Conference (CCC 2023)
 Subject(s) / Keyword(s):
 Number on the forehead communication complexity matrix multiplication Theory of computation → Communication complexity
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)A twoplayer finite game is represented by two payoff matrices (A, B), one for each player. Imitation games are a subclass of twoplayer games in which B is the identity matrix, implying that the second player gets a positive payoff only if she "imitates" the first. Given that the problem of computing a Nash equilibrium (NE) is known to be provably hard, even to approximate, we ask if it is any easier for imitation games. We show that much like the general case, for any c > 0, computing a 1 over n^c approximate NE of imitation games remains PPADhard, where n is the number of moves available to the players. On the other hand, we design a polynomialtime algorithm to find εapproximate NE for any given constant ε > 0 (PTAS). The former result also rules out the smooth complexity being in P, unless PPAD ⊂ RP.more » « less

In the light bulb problem, one is given uniformly random vectors x1,…,xn,y1,…,yn∈{−1,1}d. They are all chosen independently except a planted pair (xi∗,yj∗) is chosen with correlation ρ>0. The goal is to find the planted pair. This problem was introduced over 30 years ago by L.~Valiant, and is known to have many applications in data analysis, statistics, and learning theory. The naive algorithm runs in Ω(n2) time, and algorithms based on LocalitySensitive Hashing approach quadratic time as ρ→0. In 2012, G.~Valiant gave a breakthrough algorithm using fast matrix multiplication that runs in time O(n(5−ω)/(4−ω))more » « less
0 is. This was subsequently refined by Karppa, Kaski, and Kohonen in 2016 to O(n2ω/3) 0. We also introduce a new tensor T2112, which has the same size of 2×2 matrix multiplication tensor, but runs faster than the Strassen's algorithm for light bulb problem. 
In this paper, we focus on the parallel communication cost of multiplying a matrix with its transpose, known as a symmetric rankk update (SYRK). SYRK requires half the computation of general matrix multiplication because of the symmetry of the output matrix. Recent work (Beaumont et al., SPAA '22) has demonstrated that the sequential I/O complexity of SYRK is also a constant factor smaller than that of general matrix multiplication. Inspired by this progress, we establish memoryindependent parallel communication lower bounds for SYRK with smaller constants than general matrix multiplication, and we show that these constants are tight by presenting communicationoptimal algorithms. The crux of the lower bound proof relies on extending a key geometric inequality to symmetric computations and analytically solving a constrained nonlinear optimization problem. The optimal algorithms use a triangular blocking scheme for parallel distribution of the symmetric output matrix and corresponding computation.more » « less

Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more efficient algorithms that match the leadingorder terms in the lower bounds exactly and improve practical performance. The main result of this work is the establishment of memoryindependent communication lower bounds with tight constants for parallel matrix multiplication. Our constants improve on previous work in each of three cases that depend on the relative sizes of the aspect ratios of the matrices.more » « less