Search for: All records

Creators/Authors contains: "Sukumaran-Rajam, Aravind"

« Prev Next »

Total Resources

15

Resource Type
Conference Paper

15

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

15

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition

https://doi.org/10.1145/3572848.3577478

Xiang, Lizhi ; Yin, Miao ; Zhang, Chengming ; Sukumaran-Rajam, Aravind ; Sadayappan, P. ; Yuan, Bo ; Tao, Dingwen ( February 2023 , The 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2023))

Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference code on GPUs. Specifically, we propose an ADMM-based training algorithm that can achieve highly accurate Tucker-format models. We also develop a high-performance kernel for Tucker-format convolutions and analytical performance models to guide the selection of execution parameters. We further propose a co-design framework to determine the proper Tucker ranks driven by practical inference time (rather than FLOPs). Our evaluation on five modern CNNs with A100 demonstrates that our compressed models with our optimized code achieve up to 2.21× speedup over cuDNN, 1.12× speedup over TVM, and 3.27× over the original models using cuDNN with at most 0.05% accuracy loss.
more » « less
Full Text Available
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs

https://doi.org/10.1145/3559009.3569674

Xu, Yufan ; Yuan, Qiwei ; Barton, Erik Curtis ; Li, Rui ; Sadayappan, P. ; Sukumaran-Rajam, Aravind ( October 2022 , PACT'22)

Full Text Available
Sparsity-Aware Tensor Decomposition

https://doi.org/10.1109/IPDPS53621.2022.00097

Kurt, Sureyya Emre ; Raje, Saurabh ; Sukumaran-Rajam, Aravind ; Sadayappan, P. ( May 2022 , 2022 IEEE International Parallel and Distributed Processing Symposium)

Full Text Available
Comprehensive Accelerator-Dataflow Co-design Optimization for Convolutional Neural Networks

https://doi.org/10.1109/CGO53902.2022.9741281

Vaidya, Miheer ; Sukumaran-Rajam, Aravind ; Rountev, Atanas ; Sadayappan, P. ( April 2022 , International Symposium on Code Generation and Optimization (CGO))

Full Text Available
Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution

https://doi.org/10.1145/3497776.3517766

Xu, Yufan ; Raje, Saurabh ; Rountev, Atanas ; Sabin, Gerald ; Sukumaran-Rajam, Aravind ; Sadayappan, P. ( March 2022 , 31st ACM SIGPLAN International Conference on Compiler Construction)

Full Text Available
Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled Execution

Xu, Yufan ; Raje, Saurabh ; Rountev, Atanas Rountev ; Sabin, Gerald ; Sukumaran-Rajam, Aravind ; Sadayappan, P. ( February 2022 , ACM SIGPLAN International Conference on Compiler Construction (CC))

Full Text Available
Efficient Distributed Algorithms for Convolutional Neural Networks

https://doi.org/10.1145/3409964.3461828

Li, Rui ; Xu, Yufan ; Sukumaran-Rajam, Aravind ; Rountev, Atanas ; Sadayappan, P. ( July 2021 , Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’21))

Full Text Available
Analytical characterization and design space exploration for optimization of CNNs

https://doi.org/10.1145/3445814.3446759

Li, Rui ; Xu, Yufan ; Sukumaran-Rajam, Aravind ; Rountev, Atanas ; Sadayappan, P. ( April 2021 , Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21))

Full Text Available
Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures

https://doi.org/10.1109/SC41405.2020.00091

Kurt, Sureyya Emre ; Sukumaran-Rajam, Aravind ; Rastello, Fabrice ; Sadayappan, P. ( November 2020 , SC20: International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
Full Text Available
ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization

https://doi.org/10.1145/3394486.3403227

Moon, Gordon E. ; Ellis, J. Austin ; Sukumaran-Rajam, Aravind ; Parthasarathy, Srinivasan ; Sadayappan, P. ( July 2020 , Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’20))
null (Ed.)
Full Text Available

« Prev Next »