NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Coderosetta: Pushing the boundaries of unsupervised code translation for parallel programming

Tehrani, Ali; Bhattacharjee, Arijit; Chen, Le; Ahmed, Nesreen K; Yazdanbakhsh, Amir; Jannesari, Ali (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

You, Haoran; Fu, Yichao; Wang, Zheng; Yazdanbakhsh, Amir; Lin, Yingyan Celine (July 2024, Cambridge MA: JMLR)
Lawrence, Neil (Ed.)
Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2× speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.
more » « less
Full Text Available
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

You, Haoran; Fu, Yichao; Wang, Zheng; Yazdanbakhsh, Amir; Lin, Yingyan Celine (July 2024, Proceedings of Machine Learning Research)
Lawrence, Neil (Ed.)
Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2× speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.
more » « less
Full Text Available
In-Storage Domain-Specific Acceleration for Serverless Computing

https://doi.org/10.1145/3620665.3640413

Mahapatra, Rohan; Ghodrati, Soroush; Ahn, Byung Hoon; Kinzer, Sean; Wang, Shu-Ting; Xu, Hanyang; Karthikeyan, Lavanya; Sharma, Hardik; Yazdanbakhsh, Amir; Alian, Mohammad; et al (April 2024, ACM)

Full Text Available
Accelerating attention through gradient-based learned runtime pruning

https://doi.org/10.1145/3470496.3527423

Li, Zheng; Ghodrati, Soroush; Yazdanbakhsh, Amir; Esmaeilzadeh, Hadi; Kang, Mingu (June 2022, Accelerating attention through gradient-based learned runtime pruning)

Full Text Available
ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks

Elthakeb, Ahmed; Pilligundla, Prannoy; Mireshghallah, Fatemehsadat; Yazdanbakhsh, Amir; Gao, Sicuan; Esmaeilzadeh, Hadi (May 2019, NeurIPS ML for Systems workshop, 2018)

Full Text Available
GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

https://doi.org/10.1109/ISCA.2018.00060

Yazdanbakhsh, Amir; Samadi, Kambiz; Kim, Nam Sung; Esmaeilzadeh, Hadi (June 2018, ISCA)

Full Text Available
SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning

https://doi.org/10.1109/MM.2018.043191125

Liu, Zhenhong; Yazdanbakhsh, Amir; Park, Taejoon; Esmaeilzadeh, Hadi; Kim, Nam Sung (July 2018, IEEE Micro)

Full Text Available
SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks

https://doi.org/10.1109/ISCA.2018.00061

Akhlaghi, Vahideh; Yazdanbakhsh, Amir; Samadi, Kambiz; Gupta, Rajesh K.; Esmaeilzadeh, Hadi (June 2018, ISCA)

Full Text Available
FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks

https://doi.org/10.1109/FCCM.2018.00019

Yazdanbakhsh, Amir; Brzozowski, Michael; Khaleghi, Behnam; Ghodrati, Soroush; Samadi, Kambiz; Kim, Nam Sung; Esmaeilzadeh, Hadi (April 2018, FCCM)

Full Text Available

« Prev Next »

Search for: All records