NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mobile-3DCNN: An Acceleration Framework for Ultra-Real-Time Execution of Large 3D CNNs on Mobile Devices

https://doi.org/10.1145/3747842

Niu, Wei; Sun, Mengshu; Li, Zhengang; Chen, Jou-An; Guan, Jiexiong; Shen, Xipeng; Liu, Jun; Zhang, Mei; Wang, Yanzhi; Lin, Xue; et al (July 2025, ACM Transactions on Architecture and Code Optimization)

It is challenging to deploy 3D Convolutional Neural Networks (3D CNNs) on mobile devices, specifically if both real-time execution and high inference accuracy are in demand, because the increasingly large model size and complex model structure of 3D CNNs usually require tremendous computation and memory resources. Weight pruning is proposed to mitigate this challenge. However, existing pruning is either not compatible with modern parallel architectures, resulting in long inference latency or subject to significant accuracy degradation. This paper proposes an end-to-end 3D CNN acceleration framework based on pruning/compilation co-design called Mobile-3DCNN that consists of two parts: a novel, fine-grained structured pruning enhanced by a prune/Winograd adaptive selection (that is mobile-hardware-friendly and can achieve high pruning accuracy), and a set of compiler optimization and code generation techniques enabled by our pruning (to fully transform the pruning benefit to real performance gains). The evaluation demonstrates that Mobile-3DCNN outperforms state-of-the-art end-to-end DNN acceleration frameworks that support 3D CNN execution on mobile devices, Alibaba Mobile Neural Networks and Pytorch-Mobile with speedup up to 34 × with minor accuracy degradation, proving it is possible to execute high-accuracy large 3D CNNs on mobile devices in real-time (or even ultra-real-time).
more » « less
Free, publicly-accessible full text available July 22, 2026
Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design

https://doi.org/10.1007/978-3-031-72855-6_21

Li, Gen; Shu, Zhihao; Ji, Jie; Qin, Minghai; Afghah, Fatemeh; Niu, Wei; Ma, Xiaolong (November 2024, Springer Nature Switzerland)

Free, publicly-accessible full text available November 9, 2025
Towards Recognizing Food Types for Unseen Subjects

https://doi.org/10.1145/3696424

Guan, Jiexiong; Wang, Junjie; Niu, Wei; Peng, Zhen; Wang, Shuangquan; Liu, Zhenming; Zhou, Gang; Ren, Bin (September 2024, ACM Transactions on Computing for Healthcare)

Recognizing food types through sensor signals for unseen users remains remarkably challenging, despite extensive recent studies. The efficacy of prior machine learning techniques is dwarfed by giant variations of data collected from multiple participants, partly because users have varied chewing habits and wear sensor devices in various manners. This work treats the problem as an instance of the domain adaptation problem, where each user represents a domain. We develop the first multi-source domain adaptation (MSDA) method for food-typing recognition, which consists of three major components: stratified normalization, a multi-source domain adaptor, and adaptive ensemble learning. New techniques are developed for each component. Using a real-world dataset comprised of 15 participants, we demonstrate that our method achieves\(1.33\times\)to\(2.13\times\)improvement in accuracy compared with nine state-of-the-art MSDA baselines. Additionally, we perform an in-depth ablation study to examine the behavior of each component and confirm their efficacy.
more » « less
Full Text Available
SoD2: Statically Optimizing Dynamic Neural Network Execution

Niu, Wei; Agrawal, Gagan; Ren, Bin (May 2024, ACM)

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution are becoming common. This paper presents SoD2, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains.
more » « less
Full Text Available
SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution

https://doi.org/10.1145/3617232.3624869

Niu, Wei; Agrawal, Gagan; Ren, Bin (April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

https://doi.org/10.1145/3620666.3651384

Niu, Wei; Sanim, Md_Musfiqur Rahman; Shu, Zhihao; Guan, Jiexiong; Shen, Xipeng; Yin, Miao; Agrawal, Gagan; Ren, Bin (April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization

Li, Gen; Yin, Lu; Ji, Jie; Niu, Wei; Qin, Minghai; Ren, Bin; Guo, Linke; Liu, Shiwei; Ma, Xiaolong (May 2024, ICLR)

Full Text Available
Survey: Exploiting Data Redundancy for Optimization of Deep Learning

https://doi.org/10.1145/3564663

Chen, Jou-An; Niu, Wei; Ren, Bin; Wang, Yanzhi; Shen, Xipeng (October 2023, ACM Computing Surveys)

Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN) . It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detect and exploit data redundancy also vary in many aspects. There is not yet a systematic examination and summary of the many efforts, making it difficult for researchers to get a comprehensive view of the prior work, the state of the art, differences and shared principles, and the areas and directions yet to explore. This article tries to fill the void. It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future exploration.
more » « less
Full Text Available
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

https://doi.org/10.1109/CVPR52729.2023.00989

Li, Gen; Ji, Jie; Qin, Minghai; Niu, Wei; Ren, Bin; Afghah, Fatemeh; Guo, Linke; Ma, Xiaolong (June 2023, IEEE)

Full Text Available
Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge

https://doi.org/10.1109/CVPR52729.2023.01478

Yang, Changdi; Zhao, Pu; Li, Yanyu; Niu, Wei; Guang, Jiexiong; Tang, Hao; Qin, Minghai; Ren, Bin; Lin, Xue; Wang, Yanzhi (June 2023, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023)

Full Text Available

« Prev Next »

Search for: All records