NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Autonomous Vehicle Assistant (AVA): Emerging technology design supporting blind and visually impaired travelers in autonomous transportation

https://doi.org/10.1016/j.ijhcs.2023.103125

Fink, Paul D.S.; Doore, Stacy A.; Lin, Xue; Maring, Matthew; Zhao, Pu; Nygaard, Aubree; Beals, Grant; Corey, Richard R.; Perry, Raymond J.; Freund, Katherine; et al (November 2023, International Journal of Human-Computer Studies)

Full Text Available
Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge

https://doi.org/10.1109/CVPR52729.2023.01478

Yang, Changdi; Zhao, Pu; Li, Yanyu; Niu, Wei; Guang, Jiexiong; Tang, Hao; Qin, Minghai; Ren, Bin; Lin, Xue; Wang, Yanzhi (June 2023, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023)

Full Text Available
Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge

Yang, Changdi; Zhao, Pu; Li, Yanyu; Niu, Wei; Guan, Jiexiong; Tang, Hao; Qin, Minghai; Ren, Bin; Lin, Xue; Wang, Yanzhi (June 2023, The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR))

With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications. Vision Transformers (ViTs) have shown considerably stronger results for many vision tasks. However, ViTs with the fullattention mechanism usually consume a large number of computational resources, leading to difficulties for realtime inference on edge devices. In this paper, we aim to derive ViTs with fewer computations and fast inference speed to facilitate the dense prediction of semantic segmentation on edge devices. To achieve this, we propose a pruning parameterization method to formulate the pruning problem of semantic segmentation. Then we adopt a bi-level optimization method to solve this problem with the help of implicit gradients. Our experimental results demonstrate that we can achieve 38.9 mIoU on ADE20K val with a speed of 56.5 FPS on Samsung S21, which is the highest mIoU under the same computation constraint with real-time inference.
more » « less
Full Text Available
Advancing Model Pruning via Bi-level Optimization

Zhang, Yihua; Yao, Yuguang; Ram, Parikshit; Zhao, Pu; Chen, Tianlong; Hong, Mingyi; Wang, Yanzhi; Liu, Sijia (December 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022))
Towards Real-Time Segmentation on the Edge

Li, Yanyu; Yang, Changdi; Zhao, Pu; Yuan, Geng; Niu, Wei; Guan, Jiexiong; Tang, Hao; Qin, Minghai; Ren, Bin; Lin, Xue; et al (February 2023, AAAI'23: The Thirty-Seventh AAAI Conference on Artificial Intelligence)

There have been many recent attempts to extend the successes of convolutional neural networks (CNNs) from 2-dimensional (2D) image classification to 3-dimensional (3D) video recognition by exploring 3D CNNs. Considering the emerging growth of mobile or Internet of Things (IoT) market, it is essential to investigate the deployment of 3D CNNs on edge devices. Previous works have implemented standard 3D CNNs (C3D) on hardware platforms, however, they have not exploited model compression for acceleration of inference. This work proposes a hardware-aware pruning approach that can fully adapt to the loop tiling technique of FPGA design and is applied onto a novel 3D network called R(2+1)D. Leveraging the powerful ADMM, the proposed pruning method achieves simultaneous high accuracy and significant acceleration of computation on FPGA. With layer-wise pruning rates up to 10× and negligible accuracy loss, the pruned model is implemented on a Xilinx ZCU102 FPGA board, where the pruned model achieves 2.6× speedup compared with the unpruned version, and 2.3× speedup and 2.3× power efficiency improvement compared with state-of-the-art FPGA implementation of C3D.
more » « less
Full Text Available
Towards Real-Time Segmentation on the Edge

https://doi.org/10.1609/aaai.v37i2.25232

Li, Yanyu; Yang, Changdi; Zhao, Pu; Yuan, Geng; Niu, Wei; Guan, Jiexiong; Tang, Hao; Qin, Minghai; Jin, Qing; Ren, Bin; et al (February 2023, Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23))

Full Text Available
All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

https://doi.org/10.1145/3508352.3549379

Gong, Yifan; Zhan, Zheng; Zhao, Pu; Wu, Yushu; Wu, Chao; Ding, Caiwen; Jiang, Weiwen; Qin, Minghai; Wang, Yanzhi (October 2022, Design Automation Conference (DAC))

Full Text Available
Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization

https://doi.org/10.24963/ijcai.2022/449

Li, Yanyu; Zhao, Pu; Yuan, Geng; Lin, Xue; Wang, Yanzhi; Chen, Xin (July 2022, Thirty-First International Joint Conference on Artificial Intelligence)

Neural architecture search (NAS) and network pruning are widely studied efficient AI techniques, but not yet perfect.NAS performs exhaustive candidate architecture search, incurring tremendous search cost.Though (structured) pruning can simply shrink model dimension, it remains unclear how to decide the per-layer sparsity automatically and optimally.In this work, we revisit the problem of layer-width optimization and propose Pruning-as-Search (PaS), an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.Specifically, we add a depth-wise binary convolution to learn pruning policies directly through gradient descent.By combining the structural reparameterization and PaS, we successfully searched out a new family of VGG-like and lightweight networks, which enable the flexibility of arbitrary width with respect to each layer instead of each stage.Experimental results show that our proposed architecture outperforms prior arts by around 1.0% top-1 accuracy under similar inference speed on ImageNet-1000 classification task.Furthermore, we demonstrate the effectiveness of our width search on complex tasks including instance segmentation and image translation.Code and models are released.
more » « less
Full Text Available
Compiler-aware neural architecture search for on-mobile real-time super-resolution

Wu, Yushu; Gong, Yifan; Zhao, Pu; Li, Yanyu; Zhan, Zheng; Niu, Wei; Tang, Hao; Qin, Minghai; Ren, Bin; Wang, Yanzhi (November 2022, European Conference on Computer Vision (ECCV))

Full Text Available
Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations

https://doi.org/10.24963/ijcai.2022/239

Zhao, Pu; Ram, Parikshit; Lu, Songtao; Yao, Yuguang; Bouneffouf, Djallel; Lin, Xue; Liu, Sijia (July 2022, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22))

Adversarial perturbations are critical for certifying the robustness of deep learning models. A ``universal adversarial perturbation'' (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, we take a novel view of UAP generation as a customized instance of ``few-shot learning'', which leverages bilevel optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). We begin by considering the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to integrate it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and data sources.
more » « less
Full Text Available

« Prev Next »

Search for: All records