NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs

https://doi.org/10.1109/FPL64840.2024.00044

Yang, Geng; Xie, Yanyue; Xue, Zhong Jia; Chang, Sung-En; Li, Yanyu; Dong, Peiyan; Lei, Jie; Xie, Weiying; Wang, Yanzhi; Lin, Xue; et al (September 2024, IEEE)

Full Text Available
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

https://doi.org/10.1109/CVPR52733.2024.00827

Li, Zhengang; Kang, Yan; Liu, Yuchen; Liu, Difan; Hinz, Tobias; Liu, Feng; Wang, Yanzhi (June 2024, IEEE)

Full Text Available
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

https://doi.org/10.1145/3650200.3656622

Li, Zhengang; Lu, Alec; Xie, Yanyue; Kong, Zhenglun; Sun, Mengshu; Tang, Hao; Xue, Zhong Jia; Dong, Peiyan; Ding, Caiwen; Wang, Yanzhi; et al (May 2024, ACM)

Full Text Available
PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile

Dong, Peiyan; Lu, Lei; Wu, Chao; Lyu, Cheng; Yuan, Geng; Tang, Hao; Wang, Yanzhi (December 2023, Advances in neural information processing systems)

Full Text Available
MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

https://doi.org/10.1109/ICCAD57390.2023.10323882

Wu, Yushu; Gong, Yifan; Zhan, Zheng; Yuan, Geng; Li, Yanyu; Wang, Qi; Wu, Chao; Wang, Yanzhi (October 2023, IEEE)

Full Text Available
Rethinking Vision Transformers for MobileNet Size and Speed

https://doi.org/10.1109/ICCV51070.2023.01549

Li, Yanyu; Hu, Ju; Wen, Yang; Evangelidis, Georgios; Salahi, Kamyar; Wang, Yanzhi; Tulyakov, Sergey; Ren, Jian (October 2023, IEEE)

Full Text Available
All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

https://doi.org/10.1145/3508352.3549379

Gong, Yifan; Zhan, Zheng; Zhao, Pu; Wu, Yushu; Wu, Chao; Ding, Caiwen; Jiang, Weiwen; Qin, Minghai; Wang, Yanzhi (October 2022, Design Automation Conference (DAC))

Full Text Available
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

https://doi.org/10.1145/3495532

Gong, Yifan; Yuan, Geng; Zhan, Zheng; Niu, Wei; Li, Zhengang; Zhao, Pu; Cai, Yuxuan; Liu, Sijia; Ren, Bin; Lin, Xue; et al (September 2022, ACM Transactions on Design Automation of Electronic Systems)

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 \( \times \) and 1.73 \( \times \) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.
more » « less
Full Text Available
Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

Geng Yuan; Yanyu Li; Sheng Li; Zhenglun Kong; Sergey Tulyakov; Xulong Tang; Yanzhi Wang; Jian Ren (January 2022, NeurIPS Proceedings)

Full Text Available
ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning

https://doi.org/10.1145/3447818.3459988

Zhang, Chengming; Yuan, Geng; Niu, Wei; Tian, Jiannan; Jin, Sian; Zhuang, Donglin; Jiang, Zhe; Wang, Yanzhi; Ren, Bin; Song, Shuaiwen Leon; et al (June 2021, The 35th ACM International Conference on Supercomputing (ICS 2021))

Convolutional neural networks (CNNs) are becoming increasingly deeper, wider, and non-linear because of the growing demand on prediction accuracy and analysis quality. The wide and deep CNNs, however, require a large amount of computing resources and processing time. Many previous works have studied model pruning to improve inference performance, but little work has been done for effectively reducing training cost. In this paper, we propose ClickTrain: an efficient and accurate end-to-end training and pruning framework for CNNs. Different from the existing pruning-during-training work, ClickTrain provides higher model accuracy and compression ratio via fine-grained architecture-preserving pruning. By leveraging pattern-based pruning with our proposed novel accurate weight importance estimation, dynamic pattern generation and selection, and compiler-assisted computation optimizations, ClickTrain generates highly accurate and fast pruned CNN models for direct deployment without any extra time overhead, compared with the baseline training. ClickTrain also reduces the end-to-end time cost of the pruning-after-training method by up to 2.3X with comparable accuracy and compression ratio. Moreover, compared with the state-of-the-art pruning-during-training approach, ClickTrain provides significant improvements both accuracy and compression ratio on the tested CNN models and datasets, under similar limited training time.
more » « less
Full Text Available

« Prev Next »

Search for: All records