NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

https://doi.org/10.1109/TCAD.2024.3443692

Dong, Peiyan; Zhuang, Jinming; Yang, Zhuoping; Ji, Shixin; Li, Yanyu; Xu, Dongkuan; Huang, Heng; Hu, Jingtong; Jones, Alex K; Shi, Yiyu; et al (November 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture

Dong, Peiyan; Zhuang, Jinming; Yang, Zhuoping; Ji, Shixin; Li, Yanyu; Xu, Dongkuan; Huang, Heng; Hu, Jingtong; Jones, Alex K; Shi, Yiyu; et al (October 2024, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS)

While Vision Transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency real-time requirement, even with quantized ViT models. Some approaches use pruning or sparsity to reduce model size and latency, but this often results in accuracy loss. To address the aforementioned constraints, in this work, we propose EQ-ViT, an end-to-end acceleration framework with novel algorithm and architecture co-design features to enable real-time ViT acceleration on AMD Versal Adaptive Compute Acceleration Platform (ACAP). The contributions are four-fold. First, we perform in-depth kernel- level performance profiling & analysis and explain the bottlenecks for existing acceleration solutions on GPU, FPGA, and ACAP. Second, on the hardware level, we introduce a new spatial and heterogeneous accelerator architecture, EQ-ViT architec- ture. This architecture leverages the heterogeneous features of ACAP, where both FPGA and artificial intelligence engines (AIEs) coexist on the same system-on-chip (SoC). Third, On the algorithm level, we create a comprehensive quantization-aware training strategy, EQ-ViT algorithm. This strategy concurrently quantizes both weights and activations into 8-bit integers, aiming to improve accuracy rather than compromise it during quanti- zation. Notably, the method also quantizes nonlinear functions for efficient hardware implementation. Fourth, we design EQ- ViT automation framework to implement the EQ-ViT architec- ture for four different ViT applications on the AMD Versal ACAP VCK190 board, achieving accuracy improvement with 2.4%, and average speedups of 315.0x, 3.39x, 3.38x, 14.92x, 59.5x, 13.1x over computing solutions of Intel Xeon 8375C vCPU, Nvidia A10G, A100, Jetson AGX Orin GPUs, and AMD ZCU102, U250 FPGAs. The energy efficiency gains are 62.2x, 15.33x, 12.82x, 13.31x, 13.5x, 21.9x.
more » « less
Full Text Available
SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs

https://doi.org/10.1109/FPL64840.2024.00044

Yang, Geng; Xie, Yanyue; Xue, Zhong Jia; Chang, Sung-En; Li, Yanyu; Dong, Peiyan; Lei, Jie; Xie, Weiying; Wang, Yanzhi; Lin, Xue; et al (September 2024, IEEE)

Full Text Available
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

https://doi.org/10.1145/3650200.3656622

Li, Zhengang; Lu, Alec; Xie, Yanyue; Kong, Zhenglun; Sun, Mengshu; Tang, Hao; Xue, Zhong Jia; Dong, Peiyan; Ding, Caiwen; Wang, Yanzhi; et al (May 2024, ACM)

Full Text Available
PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile

Dong, Peiyan; Lu, Lei; Wu, Chao; Lyu, Cheng; Yuan, Geng; Tang, Hao; Wang, Yanzhi (December 2023, Advances in neural information processing systems)

Full Text Available
SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits

Xie, Yanyue; Dong, Peiyan; Yuan, Geng; Li, Zhengang; Zabihi, Masoud; Wu, Chao; Chang, Sung-En; Zhang, Xufeng; Lin, Xue; Ding, Caiwen; et al (March 2024, 2024 Design, Automation & Test in Europe Conference)

Full Text Available
Fast and Fair Medical AI on the Edge Through Neural Architecture Search for Hybrid Vision Models

https://doi.org/10.1109/ICCAD57390.2023.10323652

Yang, Changdi; Sheng, Yi; Dong, Peiyan; Kong, Zhenglun; Li, Yanyu; Yu, Pinrui; Yang, Lei; Lin, Xue; Wang, Yanzhi (October 2023, IEEE)

Full Text Available
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices

https://doi.org/10.1145/3613424.3623771

Li, Zhengang; Yuan, Geng; Yamauchi, Tomoharu; Masoud, Zabihi; Xie, Yanyue; Dong, Peiyan; Tang, Xulong; Yoshikawa, Nobuyuki; Tiwari, Devesh; Wang, Yanzhi; et al (October 2023, ACM)

Full Text Available
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

https://doi.org/10.1007/978-3-031-20083-0_37

Kong, Zhenglun; Dong, Peiyan; Ma, Xiaolong; Meng, Xin; et al. (October 2022, European Conference on Computer Vision (ECCV), 2022.)
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

https://doi.org/10.1609/aaai.v37i7.26008

Kong, Zhenglun; Ma, Haoyu; Yuan, Geng; Sun, Mengshu; Xie, Yanyue; Dong, Peiyan; Meng, Xin; Shen, Xuan; Tang, Hao; Qin, Minghai; et al (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViT
more » « less
Full Text Available

« Prev Next »

Search for: All records