Search for: All records

Creators/Authors contains: "Yuan, Geng"

« Prev Next »

Total Resources

47

Resource Type
Conference Paper

39

Conference Proceeding

3

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

46

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

https://doi.org/10.1609/aaai.v37i7.26008

Kong, Zhenglun ; Ma, Haoyu ; Yuan, Geng ; Sun, Mengshu ; Xie, Yanyue ; Dong, Peiyan ; Meng, Xin ; Shen, Xuan ; Tang, Hao ; Qin, Minghai ; et al ( June 2023 , Proceedings of the AAAI Conference on Artificial Intelligence)

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT. Our code is released at https://github.com/ZLKong/Tri-Level-ViT

more » « less
Free, publicly-accessible full text available June 27, 2024
SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing

Li, Sheng ; Yuan, Geng ; Dai, Yue ; Zhang, Youtao ; Wang, Yanzhi ; Tang, Xulong ( February 2023 , International Conference on Learning Representations)

Full Text Available
ESRU: Extremely Low-Bit and Hardware-Efficient Stochastic Rounding Unit Design for Low-Bit DNN Training

https://doi.org/10.23919/DATE56975.2023.10137222

Chang, Sung-En ; Yuan, Geng ; Lu, Alec ; Sun, Mengshu ; Li, Yanyu ; Ma, Xiaolong ; Li, Zhengang ; Xie, Yanyue ; Qin, Minghai ; Lin, Xue ; et al ( April 2023 , 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE))
Towards Real-Time Segmentation on the Edge

Li, Yanyu ; Yang, Changdi ; Zhao, Pu ; Yuan, Geng ; Niu, Wei ; Guan, Jiexiong ; Tang, Hao ; Qin, Minghai ; Ren, Bin ; Lin, Xue ; et al ( February 2023 , AAAI'23: The Thirty-Seventh AAAI Conference on Artificial Intelligence)

There have been many recent attempts to extend the successes of convolutional neural networks (CNNs) from 2-dimensional (2D) image classification to 3-dimensional (3D) video recognition by exploring 3D CNNs. Considering the emerging growth of mobile or Internet of Things (IoT) market, it is essential to investigate the deployment of 3D CNNs on edge devices. Previous works have implemented standard 3D CNNs (C3D) on hardware platforms, however, they have not exploited model compression for acceleration of inference. This work proposes a hardware-aware pruning approach that can fully adapt to the loop tiling technique of FPGA design and is applied onto a novel 3D network called R(2+1)D. Leveraging the powerful ADMM, the proposed pruning method achieves simultaneous high accuracy and significant acceleration of computation on FPGA. With layer-wise pruning rates up to 10× and negligible accuracy loss, the pruned model is implemented on a Xilinx ZCU102 FPGA board, where the pruned model achieves 2.6× speedup compared with the unpruned version, and 2.3× speedup and 2.3× power efficiency improvement compared with state-of-the-art FPGA implementation of C3D.
more » « less
Full Text Available
Towards Real-Time Segmentation on the Edge

Li, Yanyu ; Yang, Changdi ; Zhao, Pu ; Yuan, Geng ; Niu, Wei ; Guan, Jiexiong ; Tang, Hao ; Qin, Minghai ; Jin, Qing ; Ren, Bin ; et al ( February 2023 , Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23))

Full Text Available
Towards Real-Time Segmentation on the Edge

Li, Yanyu ; Yang, Changdi ; Zhao, Pu ; Yuan, Geng ; Niu, Wei ; Guan, Jiexiong ; Tang, Hao ; Qin, Minghai ; Jin, Qing ; Ren, Bin ; et al ( February 2023 , The Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23))

Full Text Available
You Already Have It: A Generator-Free Low-Precision DNN Training Framework using Stochastic Rounding

Yuan, Geng ; Chang, Sung-en ; Jin, Qing ; Lu, Alec ; Li, Yanyu ; Wu, Yushu ; et al. ( October 2022 , European Conference on Computer Vision (ECCV), 2022.)
SparCL: Sparse Continual Learning on the Edge

Wang, Zifeng ; Zhan, Zheng ; Gong, Yifan ; Yuan, Geng ; Niu, Wei ; Jian, Tong ; Ren, Bin ; Ioannidis, Stratis ; Wang, Yanzhi ; Dy, Jennifer ( December 2022 , 2022 Conference on Neural Information Processing Systems)

Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.
more » « less
Full Text Available
SparCL: Sparse Continual Learning on the Edge

Wang, Zifeng ; Zhan, Zheng ; Gong, Yifan ; Yuan, Geng ; Niu, Wei ; Jian, Tong ; Ren, Bin ; Ioannidis, Stratis ; Wang, Yanzhi ; Dy, Jennifer ( November 2022 , Neural Information Processing Systems (NeurIPS))

Full Text Available
ESRU: Extremely Low-Bit and Hardware-Efficient Stochastic Rounding Unit Design for Low-Bit DNN Training

Chang, Sung-En ; Yuan, Geng ; Lu, Alec ; Sun, Mengshu ; Li, Yanyu ; Ma, Xiaolong ; Li, Zhengang ; Xie, Yanyue ; Qin, Minghai ; Lin, Xue ; et al ( January 2023 , Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available

« Prev Next »