skip to main content


Title: Universal Approximation Property and Equivalence of Stochastic Computing-Based Neural Networks and Binary Neural Networks
Large-scale deep neural networks are both memory and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated. Specific forms of binary neural networks (BNNs) and stochastic computing-based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be implemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware implementations. In order to address these concerns, in this paper, we prove that the ”ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior), which is a new angle from the original approximation property. The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a “bridge” to prove for BNNs. Besides the universal approximation property, we also derive an appropriate bound for bit length M in order to provide insights for the actual neural network implementations. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy complexity. In other words, they have the same asymptotic energy consumption with the growth of network size. We also provide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SCNNs are more suitable.  more » « less
Award ID(s):
1704662
NSF-PAR ID:
10177392
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
ISSN:
2159-5399
Page Range / eLocation ID:
5369-5376
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large-scale deep neural networks are both memory and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated. Spe- cific forms of binary neural networks (BNNs) and stochastic computing-based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be im- plemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware im- plementations. In order to address these concerns, in this pa- per we prove that the ”ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior), which is a new angle from the orig- inal approximation property. The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a “bridge” to prove for BNNs. Besides the universal approximation property, we also derive an appropriate bound for bit length M in order to pro- vide insights for the actual neural network implementations. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy com- plexity. In other words, they have the same asymptotic energy consumption with the growth of network size. We also pro- vide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SC- NNs are more suitable. 
    more » « less
  2. Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n log n) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results. 
    more » « less
  3. With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead. 
    more » « less
  4. Deep Neural Networks (DNNs) have become a popular instrument for solving various real-world problems. DNNs’ sophisticated structure allows them to learn complex representations and features. For this reason, Binary Neural Networks (BNNs) are widely used on edge devices, such as microcomputers. However, architecture specifics and floating-point number usage result in an increased computational operations complexity. Like other DNNs, BNNs are vulnerable to adversarial attacks; even a small perturbation to the input set may lead to an errant output. Unfortunately, only a few approaches have been proposed for verifying BNNs.This paper proposes an approach to verify BNNs on continuous input space using star reachability analysis. Our approach can compute both exact and overapproximate reachable sets of BNNs with Sign activation functions and use them for verification. The proposed approach is also efficient in constructing a complete set of counterexamples in case a network is unsafe. We implemented our approach in NNV, a neural network verification tool for DNNs and learning-enabled Cyber-Physical Systems. The experimental results show that our star-based approach is less conservative, more efficient, and scalable than the recent SMT-based method implemented in Marabou. We also provide a comparison with a quantization-based tool EEVBNN. 
    more » « less
  5. Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement. We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy. 
    more » « less