skip to main content


Title: Scalability of Hybrid SpMV on Intel Xeon Phi Knights Landing
SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of applications that are memory bandwidth and communication, not flop, driven. Sparsity and randomness in such computations play havoc with performance, especially when strong, instead of weak, scaling is attempted. In this study we develop and evaluate a hybrid implementation for strong scaling of the Compressed Vectorization-oriented sparse Row (CVR) approach to SpMV on a cluster of Intel Xeon Phi Knights Landing (KNL) processors. We show how our hybrid SpMV implementation achieves increased computational performance, yet does not address the dominant communication overhead factor at extreme scale. Issues with workload distribution, data placement, and remote reductions are assessed over a range of matrix characteristics. Our results indicate that as P 􀀀! 1 communication overhead is by far the dominant factor despite improved computational performance.  more » « less
Award ID(s):
1822939
NSF-PAR ID:
10199743
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Int. Conf. on High Performance Computing and Simulation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of applications that are memory bandwidth and communication, not flop, driven. Sparsity and randomness in such computations play havoc with performance, especially when strong, instead of weak, scaling is attempted. In this study we develop and evaluate a hybrid implementation for strong scaling of the Compressed Vectorization-oriented sparse Row (CVR) approach to SpMV on a cluster of Intel Xeon Phi Knights Landing (KNL) processors. We show how our hybrid SpMV implementation achieves increased computational performance, yet does not address the dominant communication overhead factor at extreme scale. Issues with workload distribution, data placement, and remote reductions are assessed over a range of matrix characteristics. Our results indicate that as P 􀀀! 1 communication overhead is by far the dominant factor despite improved computational performance. 
    more » « less
  2. null (Ed.)
    Communication overhead has been identified as the primary factor in overall performance degradation for sparse and irregular problems such as SpMV. Many works have shown significant communication reductions, but only for matrices with specific characteristics and by dramatically reworking the computations. This study develops and evaluates a communication avoiding distributed heterogeneous implementation for strong scaling of SpMV on the Sierra supercomputer architecture. To address the far bigger matrices characteristic of real problems, we utilize a hypergraph partitioning package HYPE to determine workload distribution and reduce inter-node communication. Additionally we investigated the performance impact of performing hypergraph partitioning on scale free graphs which had undergone a vertex delegation pre-processing step. We achieved up to 97% reduction in average message size per process at scale when using the HYPE partitioner. Despite this we show how optimizing SpMV on existing GPU architectures does provide increased computational performance, yet does not address the dominant communication overhead factor at scale despite attempts to avoid communication where possible. 
    more » « less
  3. SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of applications that are memory bandwidth and communication, not flop, driven. Sparsity and randomness in such computations play havoc with conventional implementations, especially when strong, instead of weak, scaling is attempted. This paper studies improved hybrid SpMV codes that have better performance, especially for the sparsest of such problems. Issues with both data placement and remote reductions are modeled over a range of matrix characteristics. Those factors that limit strong scalability are quantified. 
    more » « less
  4. null (Ed.)
    Sparse matrix dense vector multiplication (SpMV), exhibits the memory bandwidth and communication driven nature of many sparse linear algebra operations. Irregular memory accesses from the non-zero structure within a sparse matrix wreak havoc on performance. This paper presents strong scaling for communication avoiding SpMV implementations on a migrating thread system intended to address the lack of locality in sparse problems. We developed communication avoiding SpMV code to attempt to reduce off-node thread migration by using the hypergraph partitioning package HYPE to determine workload distribution. Additionally, we investigate the performance impact of overlapping communication and computation through the use of remote memory operations supported by the architecture. Incorporating remote memory operations with hypergraph partitioning we achieved 6.18X speedup for overall performance. 
    more » « less
  5. Irregular data structures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of Sparse Matrix-Vector multiply (SpMV). But in this work we propose instead to take a fundamentally different approach: to automatically build sets of regular sub-computations by mining for regular sub-regions in the irregular data structure. Our approach leads to code that is specialized to the sparsity structure of the input matrix, but which does not need anymore any indirection array, thereby improving SIMD vectorizability. We particularly focus on small sparse structures (below 10M nonzeros), and demonstrate substantial performance improvements and compaction capabilities compared to a classical CSR implementation and Intel MKL IE's SpMV implementation, evaluating on 200+ different matrices from the SuiteSparse repository. 
    more » « less