NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Communication Optimization for Distributed Execution of Graph Neural Networks

https://doi.org/10.1109/IPDPS54959.2023.00058

Kurt, Süreyya Emre; Yan, Jinghua; Sukumaran-Rajam, Aravind; Pandey, Prashant; Sadayappan, P (May 2023, IEEE)
A Performance Portability Study Using Tensor Contraction Benchmarks

https://doi.org/10.1109/IPDPSW59300.2023.00102

Ozturk, M. Emin; Asudeh, Omid; Sabin, Gerald; Sadayappan, P.; Sukumaran-Rajam, Aravind (May 2023, IEEE)

Full Text Available
High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUs

https://doi.org/10.1145/3559009.3569667

Xiang, Lizhi; Sadayappan, P.; Sukumaran-Rajam, Aravind (October 2022, ACM)

Full Text Available
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition

https://doi.org/10.1145/3572848.3577478

Xiang, Lizhi; Yin, Miao; Zhang, Chengming; Sukumaran-Rajam, Aravind; Sadayappan, P.; Yuan, Bo; Tao, Dingwen (February 2023, The 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2023))

Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference code on GPUs. Specifically, we propose an ADMM-based training algorithm that can achieve highly accurate Tucker-format models. We also develop a high-performance kernel for Tucker-format convolutions and analytical performance models to guide the selection of execution parameters. We further propose a co-design framework to determine the proper Tucker ranks driven by practical inference time (rather than FLOPs). Our evaluation on five modern CNNs with A100 demonstrates that our compressed models with our optimized code achieve up to 2.21× speedup over cuDNN, 1.12× speedup over TVM, and 3.27× over the original models using cuDNN with at most 0.05% accuracy loss.
more » « less
Full Text Available
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs

https://doi.org/10.1145/3559009.3569674

Xu, Yufan; Yuan, Qiwei; Barton, Erik Curtis; Li, Rui; Sadayappan, P.; Sukumaran-Rajam, Aravind (October 2022, PACT'22)

Full Text Available
Sparsity-Aware Tensor Decomposition

https://doi.org/10.1109/IPDPS53621.2022.00097

Kurt, Sureyya Emre; Raje, Saurabh; Sukumaran-Rajam, Aravind; Sadayappan, P. (May 2022, 2022 IEEE International Parallel and Distributed Processing Symposium)

Full Text Available
Comprehensive Accelerator-Dataflow Co-design Optimization for Convolutional Neural Networks

https://doi.org/10.1109/CGO53902.2022.9741281

Vaidya, Miheer; Sukumaran-Rajam, Aravind; Rountev, Atanas; Sadayappan, P. (April 2022, International Symposium on Code Generation and Optimization (CGO))

Full Text Available
Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution

https://doi.org/10.1145/3497776.3517766

Xu, Yufan; Raje, Saurabh; Rountev, Atanas; Sabin, Gerald; Sukumaran-Rajam, Aravind; Sadayappan, P. (March 2022, 31st ACM SIGPLAN International Conference on Compiler Construction)

Full Text Available
Efficient Distributed Algorithms for Convolutional Neural Networks

https://doi.org/10.1145/3409964.3461828

Li, Rui; Xu, Yufan; Sukumaran-Rajam, Aravind; Rountev, Atanas; Sadayappan, P. (July 2021, Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’21))

Full Text Available
Analytical characterization and design space exploration for optimization of CNNs

https://doi.org/10.1145/3445814.3446759

Li, Rui; Xu, Yufan; Sukumaran-Rajam, Aravind; Rountev, Atanas; Sadayappan, P. (April 2021, Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21))

Full Text Available

« Prev Next »

Search for: All records