NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

Fu, Yonggan; Yu, Zhongzhi; Li, Junwei; Qian, Jiayi; Zhang, Yongan; Yuan, Xiangchi; Shi, Dachuan; Yakunin, Roman; Lin, Yingyan Celine (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Motivated by the transformative capabilities of large language models (LLMs) across various natural language tasks, there has been a growing demand to deploy these models effectively across diverse real-world applications and platforms. However, the challenge of efficiently deploying LLMs has become increasingly pronounced due to the varying application-specific performance requirements and the rapid evolution of computational platforms, which feature diverse resource constraints and deployment flows. These varying requirements necessitate LLMs that can adapt their structures (depth and width) for optimal efficiency across different platforms and application specifications. To address this critical gap, we propose AmoebaLLM, a novel framework designed to enable the instant derivation of LLM subnets of arbitrary shapes, which achieve the accuracyefficiency frontier and can be extracted immediately after a one-time fine-tuning. In this way, AmoebaLLM significantly facilitates rapid deployment tailored to various platforms and applications. Specifically, AmoebaLLM integrates three innovative components: (1) a knowledge-preserving subnet selection strategy that features a dynamic-programming approach for depth shrinking and an importancedriven method for width shrinking; (2) a shape-aware mixture of LoRAs to mitigate gradient conflicts among subnets during fine-tuning; and (3) an in-place distillation scheme with loss-magnitude balancing as the fine-tuning objective. Extensive experiments validate that AmoebaLLM not only sets new standards in LLM adaptability but also successfully delivers subnets that achieve stateof-the-art trade-offs between accuracy and efficiency. Our code is available at https://github.com/GATECH-EIC/AmoebaLLM.
more » « less
Full Text Available
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

Zhang, Yongan; Yu, Zhongzhi; Fu, Yonggan; Wan, Cheng; Lin, Yingyan Celine (June 2024, IEEE International Workshop on LLM-Aided Design)

Full Text Available
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Yu, Zhongzhi; Wang, Zheng; Fu, Yonggan; Shi, Huihong; Shaikh, Khalid; Lin, Yingyan Celine (July 2024, Proceedings of Machine Learning Research)

Full Text Available
INVITED: Data4AIGChip: An Automated Data Generation and Validation Flow for LLM-assisted Hardware Design

Zhang, Yongan; Fu, Yonggan; Yu, Zhongzhi; Zhao, Kevin; Wan, Cheng; Li, Chaojian; Lin, Yingyan Celine (June 2024, ACM)

Full Text Available
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting

Yu, Zhongzhi; Wang, Zheng; Li, Yuhan; Gao, Ruijie; Zhou, Xiaoya; Bommu, Sreenidhi Reddy; Zhao, Yang Katie; Lin, Yingyan Celine (June 2024, ACM)
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models

https://doi.org/10.1109/ICCAD57390.2023.10323953

Fu, Yonggan; Zhang, Yongan; Yu, Zhongzhi; Li, Sixu; Ye, Zhifan; Li, Chaojian; Wan, Cheng; Lin, Yingyan Celine (October 2023, IEEE)
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning

https://doi.org/10.1109/CVPR52729.2023.01068

Yu, Zhongzhi; Wu, Shang; Fu, Yonggan; Zhang, Shunyao; Lin, Yingyan Celine (June 2023, The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (CVPR 2023))

Full Text Available
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

https://doi.org/10.1109/HPCA56546.2023.10071027

You, Haoran; Sun, Zhanyi; Shi, Huihong; Yu, Zhongzhi; Zhao, Yang; Zhang, Yongan; Li, Chaojian; Li, Baopu; Lin, Yingyan (February 2023, 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs’ self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency and more extensive applications to resource constrained platforms. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and Transformers for natural language processing (NLP) tasks: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns, without severely hurting the model accuracy (e.g., <=1.5% under 90% pruning ratio); while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the aforementioned enforced denser and sparser workloads for boosted hardware utilization, while integrating on-chip encoder and decoder engines to leverage ViTCoD’s algorithm pipeline for much reduced data movements. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3×, 142.9×, 86.0×, 10.1×, and 6.8× over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90%, respectively. Our code implementation is available at https://github.com/GATECH-EIC/ViTCoD.
more » « less
Full Text Available
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

Yu, Zhongzhi; Fu, Yonggan; Wu, Shang; Li, Mengquan; You, Haoran; Lin, Yingyan (March 2022, tinyML Research Symposium'22)

Low precision deep neural network (DNN) training is one of the most effective techniques for boosting DNNs’ training efficiency, as it trims down the training cost from the finest bit level. While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help NNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts. However, existing dynamic low precision training methods rely on manually designed precision schedules to achieve advantageous efficiency and accuracy trade-offs, limiting their more comprehensive practical applications and achievable performance. To this end, we propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs. It is worth noting that LDP-trained DNNs are by nature efficient during inference. Further more, we visualize the resulting temporal and spatial precision schedule and distribution of LDP trained DNNs on different tasks to better understand the corresponding DNNs’ characteristics at different training stages and DNN layers both during and after training, drawing insights for promoting further innovations. Extensive experiments and ablation studies (seven networks, five datasets, and three tasks) show that the proposed LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs. For example, in addition to having the advantage of being automated, our LDP achieves a 0.31% higher accuracy with a 39.1% lower computational cost when training ResNet-20 on CIFAR-10 as compared with the best SOTA method.
more » « less
Full Text Available
A3C-S: Automated Agent Accelerator Co-Search towards Efficient Deep Reinforcement Learning

https://doi.org/10.1109/DAC18074.2021.9586305

Fu, Yonggan; Zhang, Yongan; Li, Chaojian; Yu, Zhongzhi; Lin, Yingyan (December 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC))

Full Text Available

« Prev Next »

Search for: All records