Neuromorphic hardware, designed to mimic the neural structure of the human brain, offers an energy-efficient platform for implementing machine-learning models in the form of Spiking Neural Networks (SNNs). Achieving efficient SNN execution on this hardware requires careful consideration of various objectives, such as optimizing utilization of individual neuromorphic cores and minimizing inter-core communication. Unlike previous approaches that overlooked the architecture of the neuromorphic core when clustering the SNN into smaller networks, our approach uses architecture-aware algorithms to ensure that the resulting clusters can be effectively mapped to the core. We base our approach on a crossbar architecture for each neuromorphic core. We start with a basic architecture where neurons can only be mapped to the columns of the crossbar. Our technique partitions the SNN into clusters of neurons and synapses, ensuring that each cluster fits within the crossbar's confines, and when multiple clusters are allocated to a single crossbar, we maximize resource utilization by efficiently reusing crossbar resources. We then expand this technique to accommodate an enhanced architecture that allows neurons to be mapped not only to the crossbar's columns but also to its rows, with the aim of further optimizing utilization. To evaluate the performance of these techniques, assuming a multi-core neuromorphic architecture, we assess factors such as the number of crossbars used and the average crossbar utilization. Our evaluation includes both synthetically generated SNNs and spiking versions of well-known machine-learning models: LeNet, AlexNet, DenseNet, and ResNet. We also investigate how the structure of the SNN impacts solution quality and discuss approaches to improve it.
more »
« less
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Large language Models (LLMs), though growing exceedingly powerful, comprises of orders of magnitude less neurons and synapses than the human brain. However, it requires significantly more power/energy to operate. In this work, we propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain. In this paper, we demonstrate a framework that leverages the average spiking rate of neurons at equilibrium to train a neuromorphic spiking LM using implicit differentiation technique, thereby overcoming the non-differentiability problem of spiking neural network (SNN) based algorithms without using any type of surrogate gradient. The steady-state convergence of the spiking neurons also allows us to design a spiking attention mechanism, which is critical in developing a scalable spiking LM. Moreover, the convergence of average spiking rate of neurons at equilibrium is utilized to develop a novel ANN-SNN knowledge distillation based technique wherein we use a pre-trained BERT model as “teacher” to train our “student” spiking architecture. While the primary architecture proposed in this paper is motivated by BERT, the technique can be potentially extended to different kinds of LLMs. Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark. Our implementation source code is available at https://github.com/NeuroCompLab-psu/SpikingBERT.
more »
« less
- Award ID(s):
- 2337646
- PAR ID:
- 10496908
- Publisher / Repository:
- The Association for the Advancement of Artificial Intelligence
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 38
- Issue:
- 10
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 10998 to 11006
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other uses spiking neural networks (SNNs) to emulate biological neurons in an attempt to be as efficient and fault-tolerant as the brain. While there has been considerable progress in the former area due to a combination of effective training algorithms and acceleration platforms, the latter is still in its infancy due to the lack of both. SNNs have a distinct advantage over their ANN counterparts in that they are capable of operating in an event-driven manner, thus consuming very low power. Several recent efforts have proposed various SNN hardware design alternatives, however, these designs still incur considerable energy overheads.In this context, this paper proposes a comprehensive design spanning across the device, circuit, architecture and algorithm levels to build an ultra low-power architecture for SNN and ANN inference. For this, we use spintronics-based magnetic tunnel junction (MTJ) devices that have been shown to function as both neuro-synaptic crossbars as well as thresholding neurons and can operate at ultra low voltage and current levels. Using this MTJ-based neuron model and synaptic connections, we design a low power chip that has the flexibility to be deployed for inference of SNNs, ANNs as well as a combination of SNN-ANN hybrid networks - a distinct advantage compared to prior works. We demonstrate the competitive performance and energy efficiency of the SNNs as well as hybrid models on a suite of workloads. Our evaluations show that the proposed design, NEBULA, is up to 7.9× more energy efficient than a state-of-the-art design, ISAAC, in the ANN mode. In the SNN mode, our design is about 45× more energy-efficient than a contemporary SNN architecture, INXS. Power comparison between NEBULA ANN and SNN modes indicates that the latter is at least 6.25× more power-efficient for the observed benchmarks.more » « less
-
This paper explores the synergistic potential of neuromorphic and edge computing to create a versatile machine learning (ML) system tailored for processing data captured by dynamic vision sensors. We construct and train hybrid models, blending spiking neural networks (SNNs) and artificial neural networks (ANNs) using PyTorch and Lava frameworks. Our hybrid architecture integrates an SNN for temporal feature extraction and an ANN for classification. We delve into the challenges of deploying such hybrid structures on hardware. Specifically, we deploy individual components on Intel's Neuromorphic Processor Loihi (for SNN) and Jetson Nano (for ANN). We also propose an accumulator circuit to transfer data from the spiking to the non-spiking domain. Furthermore, we conduct comprehensive performance analyses of hybrid SNN-ANN models on a heterogeneous system of neuromorphic and edge AI hardware, evaluating accuracy, latency, power, and energy consumption. Our findings demonstrate that the hybrid spiking networks surpass the baseline ANN model across all metrics and outperform the baseline SNN model in accuracy and latency.more » « less
-
Spike train classification is an important problem in many areas such as healthcare and mobile sensing, where each spike train is a high-dimensional time series of binary values. Conventional re- search on spike train classification mainly focus on developing Spiking Neural Networks (SNNs) under resource-sufficient settings (e.g., on GPU servers). The neurons of the SNNs are usually densely connected in each layer. However, in many real-world applications, we often need to deploy the SNN models on resource-constrained platforms (e.g., mobile devices) to analyze high-dimensional spike train data. The high resource requirement of the densely-connected SNNs can make them hard to deploy on mobile devices. In this paper, we study the problem of energy-efficient SNNs with sparsely- connected neurons. We propose an SNN model with sparse spatiotemporal coding. Our solution is based on the re-parameterization of weights in an SNN and the application of sparsity regularization during optimization. We compare our work with the state-of-the-art SNNs and demonstrate that our sparse SNNs achieve significantly better computational efficiency on both neuromorphic and standard datasets with comparable classification accuracy. Furthermore, com- pared with densely-connected SNNs, we show that our method has a better capability of generalization on small-size datasets through extensive experiments.more » « less
-
null (Ed.)Spike train classification is an important problem in many areas such as healthcare and mobile sensing, where each spike train is a high-dimensional time series of binary values. Conventional re- search on spike train classification mainly focus on developing Spiking Neural Networks (SNNs) under resource-sufficient settings (e.g., on GPU servers). The neurons of the SNNs are usually densely connected in each layer. However, in many real-world applications, we often need to deploy the SNN models on resource-constrained platforms (e.g., mobile devices) to analyze high-dimensional spike train data. The high resource requirement of the densely-connected SNNs can make them hard to deploy on mobile devices. In this paper, we study the problem of energy-efficient SNNs with sparsely- connected neurons. We propose an SNN model with sparse spatio-temporal coding. Our solution is based on the re-parameterization of weights in an SNN and the application of sparsity regularization during optimization. We compare our work with the state-of-the-art SNNs and demonstrate that our sparse SNNs achieve significantly better computational efficiency on both neuromorphic and standard datasets with comparable classification accuracy. Furthermore, com- pared with densely-connected SNNs, we show that our method has a better capability of generalization on small-size datasets through extensive experiments.more » « less
An official website of the United States government

