In recent years, Convolutional Neural Networks (CNNs) have shown superior capability in visual learning tasks. While accuracy-wise CNNs provide unprecedented performance, they are also known to be computationally intensive and energy demanding for modern computer systems. In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound. We show the efficacy of ViP through experiments on four CNN models, three representative datasets, both desktop and mobile platforms, and two visual learning tasks, i.e., image classification and object detection. For example, ViP delivers 2.1x speedup with less than 1.5% accuracy degradation in ImageNet classification on VGG16, and 1.8x speedup with 0.025 mAP degradation in PASCAL VOC object detection with Faster-RCNN. ViP also reduces mobile GPU and CPU energy consumption by up to 55% and 70%, respectively. As a complementary method to existing acceleration approaches, ViP achieves 1.9x speedup on ThiNet leading to a combined speedup of 5.23x on VGG16. Furthermore, ViP provides a knob for machine learning practitioners to generate a set of CNN models with varying trade-offs between system speed/energy consumption and accuracy to better accommodate the requirements of their tasks. Code is available at https://github.com/cmu-enyac/VirtualPooling.
more »
« less
Adaptive Deep Learning for Soft Real-Time Image Classification
CNNs (Convolutional Neural Networks) are becoming increasingly important for real-time applications, such as image classification in traffic control, visual surveillance, and smart manufacturing. It is challenging, however, to meet timing constraints of image processing tasks using CNNs due to their complexity. Performing dynamic trade-offs between the inference accuracy and time for image data analysis in CNNs is challenging too, since we observe that more complex CNNs that take longer to run even lead to lower accuracy in many cases by evaluating hundreds of CNN models in terms of time and accuracy using two popular data sets, MNIST and CIFAR-10. To address these challenges, we propose a new approach that (1) generates CNN models and analyzes their average inference time and accuracy for image classification, (2) stores a small subset of the CNNs with monotonic time and accuracy relationships offline, and (3) efficiently selects an effective CNN expected to support the highest possible accuracy among the stored CNNs subject to the remaining time to the deadline at run time. In our extensive evaluation, we verify that the CNNs derived by our approach are more flexible and cost-efficient than two baseline approaches. We verify that our approach can effectively build a compact set of CNNs and efficiently support systematic time vs. accuracy trade-offs, if necessary, to meet the user-specified timing and accuracy requirements. Moreover, the overhead of our approach is little/acceptable in terms of latency and memory consumption.
more »
« less
- Award ID(s):
- 2007854
- PAR ID:
- 10281413
- Date Published:
- Journal Name:
- Technologies
- Volume:
- 9
- Issue:
- 1
- ISSN:
- 2227-7080
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Resistive random access memory (RRAM) based memristive crossbar arrays enable low power and low latency inference for convolutional neural networks (CNNs), making them suitable for deployment in IoT and edge devices. However, RRAM cells within a crossbar suffer from conductance variations, making RRAM-based CNNs vulnerable to degradation of their classification accuracy. To address this, the classification accuracy of RRAM based CNN chips can be estimated using predictive tests, where a trained regressor predicts the accuracy of a CNN chip from the CNN’s response to a compact test dataset. In this research, we present a framework for co-optimizing the pixels of the compact test dataset and the regressor. The novelty of the proposed approach lies in the ability to co-optimize individual image pixels, overcoming barriers posed by the computational complexity of optimizing the large numbers of pixels in an image using state-of-the-art techniques. The co-optimization problem is solved using a three step process: a greedy image downselection followed by backpropagation driven image optimization and regressor fine-tuning. Experiments show that the proposed test approach reduces the CNN classification accuracy prediction error by 31% compared to the state of the art. It is seen that a compact test dataset with only 2-4 images is needed for testing, making the scheme suitable for built-in test applications.more » « less
-
Resistive random access memory (RRAM) based memristive crossbar arrays enable low power and low latency inference for convolutional neural networks (CNNs), making them suitable for deployment in IoT and edge devices. However, RRAM cells within a crossbar suffer from conductance variations, making RRAM-based CNNs vulnerable to degradation of their classification accuracy. To address this, the classification accuracy of RRAM based CNN chips can be estimated using predictive tests, where a trained regressor predicts the accuracy of a CNN chip from the CNN’s response to a compact test dataset. In this research, we present a framework for co-optimizing the pixels of the compact test dataset and the regressor. The novelty of the proposed approach lies in the ability to co-optimize individual image pixels, overcoming barriers posed by the computational complexity of optimizing the large numbers of pixels in an image using state-of-the-art techniques. The co-optimization problem is solved using a three step process: a greedy image downselection followed by backpropagation driven image optimization and regressor fine-tuning. Experiments show that the proposed test approach reduces the CNN classification accuracy prediction error by 31% compared to the state of the art. It is seen that a compact test dataset with only 2-4 images is needed for testing, making the scheme suitable for built-in test applications.more » « less
-
The high demand for computational and storage resources severely impedes the deployment of deep convolutional neural networks (CNNs) in limited resource devices. Recent CNN architectures have proposed reduced complexity versions (e.g,. SuffleNet and MobileNet) but at the cost of modest decreases in accuracy. This paper proposes pSConv, a pre-defined sparse 2D kernel based convolution, which promises significant improvements in the trade-off between complexity and accuracy for both CNN training and inference. To explore the potential of this approach, we have experimented with two widely accepted datasets, CIFAR-10 and Tiny ImageNet, in sparse variants of both the ResNet18 and VGG16 architectures. Our approach shows a parameter count reduction of up to 4.24× with modest degradation in classification accuracy relative to that of standard CNNs. Our approach outperforms a popular variant of ShuffleNet using a variant of ResNet18 with pSConv having 3 × 3 kernels with only four of nine elements not fixed at zero. In particular, the parameter count is reduced by 1.7× for CIFAR-10 and 2.29× for Tiny ImageNet with an increased accuracy of ~ 4%.more » « less
-
It is challenging to deploy 3D Convolutional Neural Networks (3D CNNs) on mobile devices, specifically if both real-time execution and high inference accuracy are in demand, because the increasingly large model size and complex model structure of 3D CNNs usually require tremendous computation and memory resources. Weight pruning is proposed to mitigate this challenge. However, existing pruning is either not compatible with modern parallel architectures, resulting in long inference latency or subject to significant accuracy degradation. This paper proposes an end-to-end 3D CNN acceleration framework based on pruning/compilation co-design called Mobile-3DCNN that consists of two parts: a novel, fine-grained structured pruning enhanced by a prune/Winograd adaptive selection (that is mobile-hardware-friendly and can achieve high pruning accuracy), and a set of compiler optimization and code generation techniques enabled by our pruning (to fully transform the pruning benefit to real performance gains). The evaluation demonstrates that Mobile-3DCNN outperforms state-of-the-art end-to-end DNN acceleration frameworks that support 3D CNN execution on mobile devices, Alibaba Mobile Neural Networks and Pytorch-Mobile with speedup up to 34 × with minor accuracy degradation, proving it is possible to execute high-accuracy large 3D CNNs on mobile devices in real-time (or even ultra-real-time).more » « less
An official website of the United States government

