skip to main content


Title: Exploring sparsity of firing activities and clock gating for energy-efficient recurrent spiking neural processors
As a model of recurrent spiking neural networks, the Liquid State Machine (LSM) offers a powerful brain-inspired computing platform for pattern recognition and machine learning applications. While operated by processing neural spiking activities, the LSM naturally lends itself to an efficient hardware implementation via exploration of typical sparse firing patterns emerged from the recurrent neural network and smart processing of computational tasks that are orchestrated by different firing events at runtime. We explore these opportunities by presenting a LSM processor architecture with integrated on-chip learning and its FPGA implementation. Our LSM processor leverage the sparsity of firing activities to allow for efficient event-driven processing and activity-dependent clock gating. Using the spoken English letters adopted from the TI46 [1] speech recognition corpus as a benchmark, we show that the proposed FPGA-based neural processor system is up to 29% more energy efficient than a baseline LSM processor with little extra hardware overhead.  more » « less
Award ID(s):
1639995
NSF-PAR ID:
10026438
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings - International Symposium on Low Power Electronics and Design
ISSN:
1533-4678
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Liquid State Machine (LSM) is a promising model of recurrent spiking neural networks. It consists of a fixed recurrent network, or the reservoir, which projects to a readout layer through plastic readout synapses. The classification performance is highly dependent on the training of readout synapses which tend to be very dense and contribute significantly to the overall network complexity. We present a unifying biologically inspired calcium-modulated supervised spike-timing dependent plasticity (STDP) approach to training and sparsification of readout synapses, where supervised temporal learning is modulated by the post-synaptic firing level characterized by the post-synaptic calcium concentration. The proposed approach prevents synaptic weight saturation, boosts learning performance, and sparsifies the connectivity between the reservoir and readout layer. Using the recognition rate of spoken English letters adopted from the TI46 speech corpus as a measure of performance, we demonstrate that the proposed approach outperforms a baseline supervised STDP mechanism by up to 25%, and a competitive non-STDP spike-dependent training algorithm by up to 2.7%. Furthermore, it can prune out up to 30% of readout synapses without causing significant performance degradation. 
    more » « less
  2. In this paper, we introduce a deep spiking delayed feedback reservoir (DFR) model to combine DFR with spiking neuros: DFRs are a new type of recurrent neural networks (RNNs) that are able to capture the temporal correlations in time series while spiking neurons are energy-efficient and biologically plausible neurons models. The introduced deep spiking DFR model is energy-efficient and has the capability of analyzing time series signals. The corresponding field programmable gate arrays (FPGA)-based hardware implementation of such deep spiking DFR model is introduced and the underlying energy-efficiency and recourse utilization are evaluated. Various spike encoding schemes are explored and the optimal spike encoding scheme to analyze the time series has been identified. To be specific, we evaluate the performance of the introduced model using the spectrum occupancy time series data in MIMO-OFDM based cognitive radio (CR) in dynamic spectrum sharing (DSS) networks. In a MIMO-OFDM DSS system, available spectrum is very scarce and efficient utilization of spectrum is very essential. To improve the spectrum efficiency, the first step is to identify the frequency bands that are not utilized by the existing users so that a secondary user (SU) can use them for transmission. Due to the channel correlation as well as users' activities, there is a significant temporal correlation in the spectrum occupancy behavior of the frequency bands in different time slots. The introduced deep spiking DFR model is used to capture the temporal correlation of the spectrum occupancy time series and predict the idle/busy subcarriers in future time slots for potential spectrum access. Evaluation results suggest that our introduced model achieves higher area under curve (AUC) in the receiver operating characteristic (ROC) curve compared with the traditional energy detection-based strategies and the learning-based support vector machines (SVMs). 
    more » « less
  3. This paper presents an energy-efficient classification framework that performs human activity recognition (HAR). Typically, HAR classification tasks require a computational platform that includes a processor and memory along with sensors and their interfaces, all of which consume significant power. The presented framework employs microelectromechanical systems (MEMS) based Continuous Time Recurrent Neural Network (CTRNN) to perform HAR tasks very efficiently. In a real physical implementation, we show that the MEMS-CTRNN nodes can perform computing while consuming power on a nano-watts scale compared to the micro-watts state-of-the-art hardware. We also confirm that this huge power reduction doesn't come at the expense of reduced performance by evaluating its accuracy to classify the highly cited human activity recognition dataset (HAPT). Our simulation results show that the HAR framework that consists of a training module, and a network of MEMS-based CTRNN nodes, provides HAR classification accuracy for the HAPT that is comparable to traditional CTRNN and other Recurrent Neural Network (RNN) implantations. For example, we show that the MEMS-based CTRNN model average accuracy for the worst-case scenario of not using pre-processing techniques, such as quantization, to classify 5 different activities is 77.94% compared to 78.48% using the traditional CTRNN. 
    more » « less
  4. Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement. We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy. 
    more » « less
  5. A variety of advanced machine learning and deep learning algorithms achieve state-of-the-art performance on various temporal processing tasks. However, these methods are heavily energy inefficient—they run mainly on the power hungry CPUs and GPUs. Computing with Spiking Networks, on the other hand, has shown to be energy efficient on specialized neuromorphic hardware, e.g., Loihi, TrueNorth, SpiNNaker, etc. In this work, we present two architectures of spiking models, inspired from the theory of Reservoir Computing and Legendre Memory Units, for the Time Series Classification (TSC) task. Our first spiking architecture is closer to the general Reservoir Computing architecture and we successfully deploy it on Loihi; the second spiking architecture differs from the first by the inclusion of non-linearity in the readout layer. Our second model (trained with Surrogate Gradient Descent method) shows that non-linear decoding of the linearly extracted temporal features through spiking neurons not only achieves promising results, but also offers low computation-overhead by significantly reducing the number of neurons compared to the popular LSM based models—more than 40x reduction with respect to the recent spiking model we compare with. We experiment on five TSC datasets and achieve new SoTA spiking results (—as much as 28.607% accuracy improvement on one of the datasets), thereby showing the potential of our models to address the TSC tasks in a green energy-efficient manner. In addition, we also do energy profiling and comparison on Loihi and CPU to support our claims. 
    more » « less