skip to main content


Title: Runtime Long-Term Reliability Management Using Stochastic Computing in Deep Neural Networks
In this paper, we propose a new dynamic reliability technique using an accuracy-reconfigurable stochastic computing (ARSC) framework for deep learning computing. Unlike the conventional stochastic computing that conducts design time accuracy power/energy trade-off, the new ARSC design can adjust the bit-width of the data in run time. Hence, the ARSC can mitigate the long-term aging effects by slowing the system clock frequency, while maintaining the inference throughput by reducing the data bit-width at a small cost of accuracy. We show how to implement the recently proposed counter-based SC multiplication and bit-width reduction on a layer-wise quantization scheme for CNN networks with dynamic fixed-point data. We validate an ARSC-based five-layer convolutional neural network designs for the MNIST dataset based on Vivado HLS with constraints from Xilinx Zynq-7000 family xc7z045 platform. Experimental results show that new ARSC DNN can sufficiently compensate the NBTI induced aging effects in 10 years with marginal classification accuracy loss while maintaining or even exceeding the pre-aging computing throughput. At the same time, the proposed ARSC computing framework also reduces the active power consumption due to the frequency scaling, which can further improve system reliability due to the reduced temperature.  more » « less
Award ID(s):
1816361 2007135
NSF-PAR ID:
10279545
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proc. Int. Symposium. on Quality Electronic Design (ISQED’21)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we propose a novel accuracy-reconfigurable stochastic computing (ARSC) framework for dynamic reliability and power management. Different than the existing stochastic computing works, where the accuracy versus power/energy trade-off is carried out in the design time, the new ARSC design can change accuracy or bit-width of the data in the run-time so that it can accommodate the long-term aging effects by slowing the system clock frequency at the cost of accuracy while maintaining the throughput of the computing. We validate the ARSC concept on a discrete cosine transformation (DCT) and inverse DCT designs for image compressing/decompressing applications, which are implemented on Xilinx Spartan-6 family XC6SLX45 platform. Experimental results show that the new design can easily mitigate the long-term aging-induced effects by accuracy trade-off while maintaining the throughput of the whole computing process using simple frequency scaling. We further show that one-bit precision loss for the input data, which translated to 3.44dB of the accuracy loss in term of Peak Signal to Noise Ratio (PSNR) for images, we can sufficiently compensate the NBTI induced aging effects in 10 years while maintaining the pre-aging computing throughput of 7.19 frames per second. At the same time, we can save 74\% power consumption by 10.67dB of accuracy loss. The proposed ARSC computing framework also allows much aggressive frequency scaling, which can lead to order of magnitude power savings compared to the traditional dynamic voltage and frequency scaling (DVFS) techniques. 
    more » « less
  2. Wireless networks are being applied in various industrial sectors, and they are posed to support mission-critical industrial IoT applications which require ultra-reliable, low-latency communications (URLLC). Ensuring predictable per-packet communication reliability is a basis of predictable URLLC, and scheduling and power control are two basic enablers. Scheduling and power control, however, are subject to challenges such as harsh environments, dynamic channels, and distributed network settings in industrial IoT. Existing solutions are mostly based on heuristic algorithms or asymptotic analysis of network performance, and there lack field-deployable algorithms for ensuring predictable per-packet reliability. Towards addressing the gap, we examine the cross-layer design of joint scheduling and power control and analyze the associated challenges. We introduce the Perron–Frobenius theorem to demonstrate that scheduling is a must for ensuring predictable communication reliability, and by investigating characteristics of interference matrices, we show that scheduling with close-by links silent effectively constructs a set of links whose required reliability is feasible with proper transmission power control. Given that scheduling alone is unable to ensure predictable communication reliability while ensuring high throughput and addressing fast-varying channel dynamics, we demonstrate how power control can help improve both the reliability at each time instant and throughput in the long-term. Based on the analysis, we propose a candidate framework of joint scheduling and power control, and we demonstrate how this framework behaves in guaranteeing per-packet communication reliability in the presence of wireless channel dynamics of different time scales. Collectively, these findings provide insight into the cross-layer design of joint scheduling and power control for ensuring predictable per-packet reliability in the presence of wireless network dynamics and uncertainties. 
    more » « less
  3. Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as `CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement `bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the `in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication. 
    more » « less
  4. With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead. 
    more » « less
  5. Stochastic computing (SC) is a digital design paradigm that foregoes the conventional binary encoding in favor of pseudo-random bitstreams. Stochastic circuits operate on the probability values of bitstreams, and often achieve low power, low area, and fault-tolerant computation. Most SC designs rely on the input bitstreams being independent or uncorrelated to obtain the best results. However, circuits have also been proposed that exploit deliberately correlated bitstreams to improve area or accuracy. In such cases, different sub-circuits may have different correlation requirements. A major barrier to multi-layer or hierarchical stochastic circuit design has been understanding how correlation propagates while meeting the correlation requirements for all its sub-circuits. In this paper, we introduce correlation matrices and extensions to probability transfer matrix (PTM) algebra to analyze complex correlation behavior, thereby alleviating the need for computationally intensive bit-wise simulation. We apply our new correlation analysis to two multi-layer SC image processing and neural network circuits and show that it helps designers to systematically reduce correlation error. 
    more » « less