Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ~ 116X with negligible accuracy degradation.
more »
« less
Toward Efficient and Adaptive Design of Video Detection System with Deep Neural Networks
In the past decade, Deep Neural Networks (DNNs), e.g., Convolutional Neural Networks, achieved human-level performance in vision tasks such as object classification and detection. However, DNNs are known to be computationally expensive and thus hard to be deployed in real-time and edge applications. Many previous works have focused on DNN model compression to obtain smaller parameter sizes and consequently, less computational cost. Such methods, however, often introduce noticeable accuracy degradation. In this work, we optimize a state-of-the-art DNN-based video detection framework—Deep Feature Flow (DFF) from the cloud end using three proposed ideas. First, we propose Asynchronous DFF (ADFF) to asynchronously execute the neural networks. Second, we propose a Video-based Dynamic Scheduling (VDS) method that decides the detection frequency based on the magnitude of movement between video frames. Last, we propose Spatial Sparsity Inference, which only performs the inference on part of the video frame and thus reduces the computation cost. According to our experimental results, ADFF can reduce the bottleneck latency from 89 to 19 ms. VDS increases the detection accuracy by 0.6% mAP without increasing computation cost. And SSI further saves 0.2 ms with a 0.6% mAP degradation of detection accuracy.
more »
« less
- PAR ID:
- 10350326
- Date Published:
- Journal Name:
- ACM Transactions on Embedded Computing Systems
- Volume:
- 21
- Issue:
- 3
- ISSN:
- 1539-9087
- Page Range / eLocation ID:
- 1 to 21
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose AccHashtag, the first framework for high-accuracy detection of fault-injection attacks on Deep Neural Networks (DNNs) with provable bounds on detection performance. Recent literature in fault-injection attacks shows the severe DNN accuracy degradation caused by bit flips. In this scenario, the attacker changes a few DNN weight bits during execution by injecting faults to the dynamic random-access memory (DRAM). To detect bit flips, AccHashtag extracts a unique signature from the benign DNN prior to deployment. The signature is used to validate the model’s integrity and verify the inference output on the fly. We propose a novel sensitivity analysis that identifies the most vulnerable DNN layers to the fault-injection attack. The DNN signature is constructed by encoding the weights in vulnerable layers using a low-collision hash function. During DNN inference, new hashes are extracted from the target layers and compared against the ground-truth signatures. AccHashtag incorporates a lightweight methodology that allows for real-time fault detection on embedded platforms. We devise a specialized compute core for AccHashtag on field-programmable gate arrays (FPGAs) to facilitate online hash generation in parallel to DNN execution. Extensive evaluations with the state-of-the-art bit-flip attack on various DNNs demonstrate the competitive advantage of AccHashtag in terms of both attack detection and execution overhead.more » « less
-
Deep learning that utilizes large-scale deep neural networks (DNNs) is effective in automatic high-level feature extraction but also computation and memory intensive. Constructing DNNs using block-circulant matrices can simultaneously achieve hardware acceleration and model compression while maintaining high accuracy. This paper proposes HSIM-DNN, an accurate hardware simulator on the C++ platform, to simulate the exact behavior of DNN hardware implementations and thereby facilitate the block-circulant matrix-based design of DNN training and inference procedures in hardware. Real FPGA implementations validate the simulator with various circulant block sizes and data bit lengths taking into account accuracy, compression ratio and power consumption, which provides excellent insights for hardware design.more » « less
-
Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets and is typically unknown due to the lack of ground truth labels. This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications. In this paper, we address trustworthiness of DNNs by using post-hoc processing to monitor the true inference accuracy on a user’s dataset. Concretely, we propose a neural network-based accuracy monitor model, which only takes the deployed DNN’s softmax probability output as its input and directly predicts if the DNN’s prediction result is correct or not, thus leading to an estimate of the true inference accuracy. The accuracy monitor model can be pre-trained on a dataset relevant to the target application of interest, and only needs to actively label a small portion (1% in our experiments) of the user’s dataset for model transfer. For estimation robustness, we further employ an ensemble of monitor models based on the Monte-Carlo dropout method. We evaluate our approach on different deployed DNN models for image classification and traffic sign detection over multiple datasets (including adversarial samples). The result shows that our accuracy monitor model provides a close-to-true accuracy estimation and outperforms the existing baseline methods.more » « less
-
Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization FrameworkEfficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Model compression strategies, including weight quantization and pruning, are widely recognized as effective approaches to significantly reduce computation and memory intensities, and have been implemented in many DNNs on edge devices. However, most state-of-the-art works focus on ad-hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different compression strategies. In this paper, we qualitatively and quantitatively compare the energy efficiency of FPGA-based and mobile-based DNN executions using mobile GPU and provide a detailed analysis. Based on the observations obtained from the analysis, we propose a unified optimization framework using block-based pruning to reduce the weight storage and accelerate the inference speed on mobile devices and FPGAs, achieving high hardware performance and energy-efficiency gain while maintaining accuracy.more » « less