NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting

Li, Sixu; Keller, Ben; Lin, Yingyan Celine; Khailany, Brucek (June 2025, IEEE)

Free, publicly-accessible full text available June 22, 2026
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers

https://doi.org/10.1109/HPCA61900.2025.00029

Li, Chaojian; Li, Sixu; Jiang, Linrui; Zhang, Jingqun; Lin, Yingyan Celine (March 2025, IEEE)

Free, publicly-accessible full text available March 1, 2026
LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

https://doi.org/10.18653/v1/2025.acl-short.96

Ye, Zhifan; Wang, Zheng; Xia, Kejing; Hong, Jihoon; Li, Leshu; Whalen, Lexington; Wan, Cheng; Fu, Yonggan; Lin, Yingyan Celine; Kundu, Souvik (January 2025, Association for Computational Linguistics)

Full Text Available
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

Fu, Yonggan; Yu, Zhongzhi; Li, Junwei; Qian, Jiayi; Zhang, Yongan; Yuan, Xiangchi; Shi, Dachuan; Yakunin, Roman; Lin, Yingyan Celine (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Motivated by the transformative capabilities of large language models (LLMs) across various natural language tasks, there has been a growing demand to deploy these models effectively across diverse real-world applications and platforms. However, the challenge of efficiently deploying LLMs has become increasingly pronounced due to the varying application-specific performance requirements and the rapid evolution of computational platforms, which feature diverse resource constraints and deployment flows. These varying requirements necessitate LLMs that can adapt their structures (depth and width) for optimal efficiency across different platforms and application specifications. To address this critical gap, we propose AmoebaLLM, a novel framework designed to enable the instant derivation of LLM subnets of arbitrary shapes, which achieve the accuracyefficiency frontier and can be extracted immediately after a one-time fine-tuning. In this way, AmoebaLLM significantly facilitates rapid deployment tailored to various platforms and applications. Specifically, AmoebaLLM integrates three innovative components: (1) a knowledge-preserving subnet selection strategy that features a dynamic-programming approach for depth shrinking and an importancedriven method for width shrinking; (2) a shape-aware mixture of LoRAs to mitigate gradient conflicts among subnets during fine-tuning; and (3) an in-place distillation scheme with loss-magnitude balancing as the fine-tuning objective. Extensive experiments validate that AmoebaLLM not only sets new standards in LLM adaptability but also successfully delivers subnets that achieve stateof-the-art trade-offs between accuracy and efficiency. Our code is available at https://github.com/GATECH-EIC/AmoebaLLM.
more » « less
Full Text Available
AutoAI2C: An Automated Hardware Generator for DNN Acceleration on Both FPGA and ASIC

https://doi.org/10.1109/TCAD.2024.3393428

Zhang, Yongan; Zhang, Xiaofan; Xu, Pengfei; Zhao, Yang; Hao, Cong; Chen, Deming; Lin, Yingyan Celine (October 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Fusion-3D: Integrated Acceleration for Instant 3D Reconstruction and Real-Time Rendering

https://doi.org/10.1109/MICRO61859.2024.00016

Li, Sixu; Zhao, Yang; Li, Chaojian; Guo, Bowei; Zhang, Jingqun; Zhu, Wenbo; Ye, Zhifan; Wan, Cheng; Lin, Yingyan Celine (November 2024, IEEE)

Full Text Available
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

You, Haoran; Fu, Yichao; Wang, Zheng; Yazdanbakhsh, Amir; Lin, Yingyan Celine (July 2024, Cambridge MA: JMLR)
Lawrence, Neil (Ed.)
Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2× speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.
more » « less
Full Text Available
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

You, Haoran; Fu, Yichao; Wang, Zheng; Yazdanbakhsh, Amir; Lin, Yingyan Celine (July 2024, Proceedings of Machine Learning Research)
Lawrence, Neil (Ed.)
Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation. While linear attention and speculative decoding offer potential solutions, their applicability and synergistic potential for enhancing autoregressive LLMs remain uncertain. We conduct the first comprehensive study on the efficacy of existing linear attention methods for autoregressive LLMs, integrating them with speculative decoding. We introduce an augmentation technique for linear attention that ensures compatibility with speculative decoding, enabling more efficient training and serving of LLMs. Extensive experiments and ablation studies involving seven existing linear attention models and five encoder/decoder-based LLMs consistently validate the effectiveness of our augmented linearized LLMs. Notably, our approach achieves up to a 6.67 reduction in perplexity on the LLaMA model and up to a 2× speedup during generation compared to prior linear attention methods. Codes and models are available at https://github.com/GATECH-EIC/Linearized-LLM.
more » « less
Full Text Available
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

Zhang, Yongan; Yu, Zhongzhi; Fu, Yonggan; Wan, Cheng; Lin, Yingyan Celine (June 2024, IEEE International Workshop on LLM-Aided Design)

Full Text Available
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Yu, Zhongzhi; Wang, Zheng; Fu, Yonggan; Shi, Huihong; Shaikh, Khalid; Lin, Yingyan Celine (July 2024, Proceedings of Machine Learning Research)

Full Text Available

« Prev Next »

Search for: All records