skip to main content


Title: Energy-Efficient Deep Neural Networks with Mixed-Signal Neurons and Dense-Local and Sparse-Global Connectivity
Neuromorphic Computing has become tremendously popular due to its ability to solve certain classes of learning tasks better than traditional von-Neumann computers. Data-intensive classification and pattern recognition problems have been of special interest to Neuromorphic Engineers, as these problems present complex use-cases for Deep Neural Networks (DNNs) which are motivated from the architecture of the human brain, and employ densely connected neurons and synapses organized in a hierarchical manner. However, as these systems become larger in order to handle an increasing amount of data and higher dimensionality of features, the designs often become connectivity constrained. To solve this, the computation is divided into multiple cores/islands, called processing engines (PEs). Today, the communication among these PEs are carried out through a power-hungry network-on-chip (NoC), and hence the optimal distribution of these islands along with energy-efficient compute and communication strategies become extremely important in reducing the overall energy of the neuromorphic computer, which is currently orders of magnitude higher than the biological human brain. In this paper, we extensively analyze the choice of the size of the islands based on mixed-signal neurons/synapses for 3-8 bit-resolution within allowable ranges for system-level classification error, determined by the analog non-idealities (noise and mismatch) in the neurons, and propose strategies involving local and global communication for reduction of the system-level energy consumption. AC-coupled mixed-signal neurons are shown to have 10X lower non-idealities than DC-coupled ones, while the choice of number of islands are shown to be a function of the network, constrained by the analog to digital conversion (or viceversa) power at the interface of the islands. The maximum number of layers in an island is analyzed and a global bus-based sparse connectivity is proposed, which consumes orders of magnitude lower power than the competing powerline communication techniques.  more » « less
Award ID(s):
1944602 1657455
NSF-PAR ID:
10222326
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Asia and South Pacific Design Automation Conference
Page Range / eLocation ID:
297 to 304
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large language Models (LLMs), though growing exceedingly powerful, comprises of orders of magnitude less neurons and synapses than the human brain. However, it requires significantly more power/energy to operate. In this work, we propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain. In this paper, we demonstrate a framework that leverages the average spiking rate of neurons at equilibrium to train a neuromorphic spiking LM using implicit differentiation technique, thereby overcoming the non-differentiability problem of spiking neural network (SNN) based algorithms without using any type of surrogate gradient. The steady-state convergence of the spiking neurons also allows us to design a spiking attention mechanism, which is critical in developing a scalable spiking LM. Moreover, the convergence of average spiking rate of neurons at equilibrium is utilized to develop a novel ANN-SNN knowledge distillation based technique wherein we use a pre-trained BERT model as “teacher” to train our “student” spiking architecture. While the primary architecture proposed in this paper is motivated by BERT, the technique can be potentially extended to different kinds of LLMs. Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark. Our implementation source code is available at https://github.com/NeuroCompLab-psu/SpikingBERT.

     
    more » « less
  2. Neuromorphic computing, commonly understood as a computing approach built upon neurons, synapses, and their dynamics, as opposed to Boolean gates, is gaining large mindshare due to its direct application in solving current and future computing technological problems, such as smart sensing, smart devices, self-hosted and self-contained devices, artificial intelligence (AI) applications, etc. In a largely software-defined implementation of neuromorphic computing, it is possible to throw enormous computational power or optimize models and networks depending on the specific nature of the computational tasks. However, a hardware-based approach needs the identification of well-suited neuronal and synaptic models to obtain high functional and energy efficiency, which is a prime concern in size, weight, and power (SWaP) constrained environments. In this work, we perform a study on the characteristics of hardware neuron models (namely, inference errors, generalizability and robustness, practical implementability, and memory capacity) that have been proposed and demonstrated using a plethora of emerging nano-materials technology-based physical devices, to quantify the performance of such neurons on certain classes of problems that are of great importance in real-time signal processing like tasks in the context of reservoir computing. We find that the answer on which neuron to use for what applications depends on the particulars of the application requirements and constraints themselves, i.e., we need not only a hammer but all sorts of tools in our tool chest for high efficiency and quality neuromorphic computing.

     
    more » « less
  3. Abstract

    CMOS-based computing systems that employ the von Neumann architecture are relatively limited when it comes to parallel data storage and processing. In contrast, the human brain is a living computational signal processing unit that operates with extreme parallelism and energy efficiency. Although numerous neuromorphic electronic devices have emerged in the last decade, most of them are rigid or contain materials that are toxic to biological systems. In this work, we report on biocompatible bilayer graphene-based artificial synaptic transistors (BLAST) capable of mimicking synaptic behavior. The BLAST devices leverage a dry ion-selective membrane, enabling long-term potentiation, with ~50 aJ/µm2switching energy efficiency, at least an order of magnitude lower than previous reports on two-dimensional material-based artificial synapses. The devices show unique metaplasticity, a useful feature for generalizable deep neural networks, and we demonstrate that metaplastic BLASTs outperform ideal linear synapses in classic image classification tasks. With switching energy well below the 1 fJ energy estimated per biological synapse, the proposed devices are powerful candidates for bio-interfaced online learning, bridging the gap between artificial and biological neural networks.

     
    more » « less
  4. INTRODUCTION A brainwide, synaptic-resolution connectivity map—a connectome—is essential for understanding how the brain generates behavior. However because of technological constraints imaging entire brains with electron microscopy (EM) and reconstructing circuits from such datasets has been challenging. To date, complete connectomes have been mapped for only three organisms, each with several hundred brain neurons: the nematode C. elegans , the larva of the sea squirt Ciona intestinalis , and of the marine annelid Platynereis dumerilii . Synapse-resolution circuit diagrams of larger brains, such as insects, fish, and mammals, have been approached by considering select subregions in isolation. However, neural computations span spatially dispersed but interconnected brain regions, and understanding any one computation requires the complete brain connectome with all its inputs and outputs. RATIONALE We therefore generated a connectome of an entire brain of a small insect, the larva of the fruit fly, Drosophila melanogaster. This animal displays a rich behavioral repertoire, including learning, value computation, and action selection, and shares homologous brain structures with adult Drosophila and larger insects. Powerful genetic tools are available for selective manipulation or recording of individual neuron types. In this tractable model system, hypotheses about the functional roles of specific neurons and circuit motifs revealed by the connectome can therefore be readily tested. RESULTS The complete synaptic-resolution connectome of the Drosophila larval brain comprises 3016 neurons and 548,000 synapses. We performed a detailed analysis of the brain circuit architecture, including connection and neuron types, network hubs, and circuit motifs. Most of the brain’s in-out hubs (73%) were postsynaptic to the learning center or presynaptic to the dopaminergic neurons that drive learning. We used graph spectral embedding to hierarchically cluster neurons based on synaptic connectivity into 93 neuron types, which were internally consistent based on other features, such as morphology and function. We developed an algorithm to track brainwide signal propagation across polysynaptic pathways and analyzed feedforward (from sensory to output) and feedback pathways, multisensory integration, and cross-hemisphere interactions. We found extensive multisensory integration throughout the brain and multiple interconnected pathways of varying depths from sensory neurons to output neurons forming a distributed processing network. The brain had a highly recurrent architecture, with 41% of neurons receiving long-range recurrent input. However, recurrence was not evenly distributed and was especially high in areas implicated in learning and action selection. Dopaminergic neurons that drive learning are amongst the most recurrent neurons in the brain. Many contralateral neurons, which projected across brain hemispheres, were in-out hubs and synapsed onto each other, facilitating extensive interhemispheric communication. We also analyzed interactions between the brain and nerve cord. We found that descending neurons targeted a small fraction of premotor elements that could play important roles in switching between locomotor states. A subset of descending neurons targeted low-order post-sensory interneurons likely modulating sensory processing. CONCLUSION The complete brain connectome of the Drosophila larva will be a lasting reference study, providing a basis for a multitude of theoretical and experimental studies of brain function. The approach and computational tools generated in this study will facilitate the analysis of future connectomes. Although the details of brain organization differ across the animal kingdom, many circuit architectures are conserved. As more brain connectomes of other organisms are mapped in the future, comparisons between them will reveal both common and therefore potentially optimal circuit architectures, as well as the idiosyncratic ones that underlie behavioral differences between organisms. Some of the architectural features observed in the Drosophila larval brain, including multilayer shortcuts and prominent nested recurrent loops, are found in state-of-the-art artificial neural networks, where they can compensate for a lack of network depth and support arbitrary, task-dependent computations. Such features could therefore increase the brain’s computational capacity, overcoming physiological constraints on the number of neurons. Future analysis of similarities and differences between brains and artificial neural networks may help in understanding brain computational principles and perhaps inspire new machine learning architectures. The connectome of the Drosophila larval brain. The morphologies of all brain neurons, reconstructed from a synapse-resolution EM volume, and the synaptic connectivity matrix of an entire brain. This connectivity information was used to hierarchically cluster all brains into 93 cell types, which were internally consistent based on morphology and known function. 
    more » « less
  5. An emerging use-case of machine learning (ML) is to train a model on a high-performance system and deploy the trained model on energy-constrained embedded systems. Neuromorphic hardware platforms, which operate on principles of the biological brain, can significantly lower the energy overhead of a machine learning inference task, making these platforms an attractive solution for embedded ML systems. We present a design-technology tradeoff analysis to implement such inference tasks on the processing elements (PEs) of a Non-Volatile Memory (NVM)-based neuromorphic hardware. Through detailed circuit-level simulations at scaled process technology nodes, we show the negative impact of technology scaling on the information-processing latency, which impacts the quality-of-service (QoS) of an embedded ML system. At a finer granularity, the latency inside a PE depends on 1) the delay introduced by parasitic components on its current paths, and 2) the varying delay to sense different resistance states of its NVM cells. Based on these two observations, we make the following three contributions. First, on the technology front, we propose an optimization scheme where the NVM resistance state that takes the longest time to sense is set on current paths having the least delay, and vice versa, reducing the average PE latency, which improves the QoS. Second, on the architecture front, we introduce isolation transistors within each PE to partition it into regions that can be individually power-gated, reducing both latency and energy. Finally, on the system-software front, we propose a mechanism to leverage the proposed technological and architectural enhancements when implementing a machine-learning inference task on neuromorphic PEs of the hardware. Evaluations with a recent neuromorphic hardware architecture show that our proposed design-technology co-optimization approach improves both performance and energy efficiency of machine-learning inference tasks without incurring high cost-per-bit. 
    more » « less