Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence

Cheshmi, Kazem; Strout, Michelle; Mehri Dehnavi, Maryam

doi:10.1145/3581784.3607097

Citation Details

Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence

Dependence between iterations in sparse computations causes inefficient use of memory and computation resources. This paper proposes sparse fusion, a technique that generates efficient parallel code for the combination of two sparse matrix kernels, where at least one of the kernels has loop-carried dependencies. Existing implementations optimize individual sparse kernels separately. However, this approach leads to synchronization overheads and load imbalance due to the irregular dependence patterns of sparse kernels, as well as inefficient cache usage due to their irregular memory access patterns. Sparse fusion uses a novel inspection strategy and code transformation to generate parallel fused code optimized for data locality and load balance. Sparse fusion outperforms the best of unfused implementations using ParSy and MKL by an average of 4.2× and is faster than the best of fused implementations using existing scheduling algorithms, such as LBC, DAGP, and wavefront by an average of 4× for various kernel combinations. more »

Award ID(s):: 2107556 2106621

NSF-PAR ID:: 10482910

Author(s) / Creator(s):: Cheshmi, Kazem; Strout, Michelle; Mehri Dehnavi, Maryam

Publisher / Repository:: ACM

Date Published:: 2023-11-11

Page Range / eLocation ID:: 1 to 15

Format(s):: Medium: X

Location:: Denver CO USA

Sponsoring Org:: National Science Foundation

Conference Paper:
https://doi.org/10.1145/3581784.3607097

More Like this