NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

https://doi.org/10.1609/aaai.v37i7.26008

Kong, Zhenglun; Ma, Haoyu; Yuan, Geng; Sun, Mengshu; Xie, Yanyue; Dong, Peiyan; Meng, Xin; Shen, Xuan; Tang, Hao; Qin, Minghai; et al (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViT
more » « less
Full Text Available
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

https://doi.org/10.1109/CVPR52729.2023.00597

Shen, Xuan; Wang, Yaohua; Lin, Ming; Huang, Yilun; Tang, Hao; Sun, Xiuyu; Wang, Yanzhi (June 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
ESRU: Extremely Low-Bit and Hardware-Efficient Stochastic Rounding Unit Design for Low-Bit DNN Training

https://doi.org/10.23919/DATE56975.2023.10137222

Chang, Sung-En; Yuan, Geng; Lu, Alec; Sun, Mengshu; Li, Yanyu; Ma, Xiaolong; Li, Zhengang; Xie, Yanyue; Qin, Minghai; Lin, Xue; et al (April 2023, 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE))
Advancing Model Pruning via Bi-level Optimization

Zhang, Yihua; Yao, Yuguang; Ram, Parikshit; Zhao, Pu; Chen, Tianlong; Hong, Mingyi; Wang, Yanzhi; Liu, Sijia (December 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022))
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

https://doi.org/10.1007/978-3-031-20083-0_37

Kong, Zhenglun; Dong, Peiyan; Ma, Xiaolong; Meng, Xin; et al. (October 2022, European Conference on Computer Vision (ECCV), 2022.)
You Already Have It: A Generator-Free Low-Precision DNN Training Framework using Stochastic Rounding

https://doi.org/10.1007/978-3-031-19775-8_3

Yuan, Geng; Chang, Sung-en; Jin, Qing; Lu, Alec; Li, Yanyu; Wu, Yushu; et al. (October 2022, European Conference on Computer Vision (ECCV), 2022.)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

https://doi.org/10.1145/3495532

Gong, Yifan; Yuan, Geng; Zhan, Zheng; Niu, Wei; Li, Zhengang; Zhao, Pu; Cai, Yuxuan; Liu, Sijia; Ren, Bin; Lin, Xue; et al (September 2022, ACM Transactions on Design Automation of Electronic Systems)

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 \( \times \) and 1.73 \( \times \) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.
more » « less
Full Text Available
DNNFusion: accelerating deep neural networks execution with advanced operator fusion

https://doi.org/10.1145/3453483.3454083

Niu, Wei; Guan, Jiexiong; Wang, Yanzhi; Agrawal, Gagan; Ren, Bin (June 2021, Proceedings of PLDI 2021)
null (Ed.)
Full Text Available
Non-Structured DNN Weight Pruning--Is It Beneficial in Any Platform?

https://doi.org/10.1109/TNNLS.2021.3063265

Ma, Xiaolong; Lin, Sheng; Ye, Shaokai; He, Zhezhi; Zhang, Linfeng; Yuan, Geng; Tan, Sia Huat; Li, Zhengang; Fan, Deliang; Qian, Xuehai; et al (March 2021, IEEE Transactions on Neural Networks and Learning Systems)
null (Ed.)
Full Text Available
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

https://doi.org/10.1109/HPCA51647.2021.00027

Chang, Sung-En; Li, Yanyu; Sun, Mengshu; Shi, Runbin; So, Hayden K.-H.; Qian, Xuehai; Wang, Yanzhi; Lin, Xue (February 2021, Proc. of High Performance Computing Architecture (HPCA))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records