Design Principles for Sparse Matrix Multiplication on the GPU

Yang, Carl; Buluc, Aydin; Owens, John D.

doi:10.1007/978-3-319-96983-1_48

Citation Details

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets. more »

Award ID(s):: 1629657

PAR ID:: 10066978

Author(s) / Creator(s):: Yang, Carl; Buluc, Aydin; Owens, John D.

Date Published:: 2018-08-01

Journal Name:: Euro-Par 2018: Proceedings of the 24th International European Conference on Parallel and Distributed Computing

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
https://doi.org/10.1007/978-3-319-96983-1_48

More Like this