NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automatic Tracing in Task-Based Runtime Systems

https://doi.org/10.1145/3669940.3707237

Yadav, Rohan; Bauer, Michael; Broman, David; Garland, Michael; Aiken, Alex; Kjolstad, Fredrik (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Composing Distributed Computations Through Task and Kernel Fusion

https://doi.org/10.1145/3669940.3707216

Yadav, Rohan; Sundram, Shiv; Lee, Wonchan; Garland, Michael; Bauer, Michael; Aiken, Alex; Kjolstad, Fredrik (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

https://doi.org/10.1145/3633462

Ahmad, Khalid; Cecka, Cris; Garland, Michael; Hall, Mary (March 2024, ACM Transactions on Architecture and Code Optimization)

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTM is a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE’s SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.
more » « less
Full Text Available
Legate Sparse: Distributed Sparse Computing in Python

https://doi.org/10.1145/3581784.3607033

Yadav, Rohan; Lee, Wonchan; Elibol, Melih; Papadakis, Manolis; Lee-Patti, Taylor; Garland, Michael; Aiken, Alex; Kjolstad, Fredrik; Bauer, Michael (November 2023, ACM)

Full Text Available
A Programmable Approach to Neural Network Compression

https://doi.org/10.1109/MM.2020.3012391

Joseph, Vinu; Gopalakrishnan, Ganesh L.; Muralidharan, Saurav; Garland, Michael; Garg, Animesh (September 2020, IEEE Micro)

Full Text Available

Search for: All records