Neuromorphic hardware, designed to mimic the neural structure of the human brain, offers an energy-efficient platform for implementing machine-learning models in the form of Spiking Neural Networks (SNNs). Achieving efficient SNN execution on this hardware requires careful consideration of various objectives, such as optimizing utilization of individual neuromorphic cores and minimizing inter-core communication. Unlike previous approaches that overlooked the architecture of the neuromorphic core when clustering the SNN into smaller networks, our approach uses architecture-aware algorithms to ensure that the resulting clusters can be effectively mapped to the core. We base our approach on a crossbar architecture for each neuromorphic core. We start with a basic architecture where neurons can only be mapped to the columns of the crossbar. Our technique partitions the SNN into clusters of neurons and synapses, ensuring that each cluster fits within the crossbar's confines, and when multiple clusters are allocated to a single crossbar, we maximize resource utilization by efficiently reusing crossbar resources. We then expand this technique to accommodate an enhanced architecture that allows neurons to be mapped not only to the crossbar's columns but also to its rows, with the aim of further optimizing utilization. To evaluate the performance of these techniques, assuming a multi-core neuromorphic architecture, we assess factors such as the number of crossbars used and the average crossbar utilization. Our evaluation includes both synthetically generated SNNs and spiking versions of well-known machine-learning models: LeNet, AlexNet, DenseNet, and ResNet. We also investigate how the structure of the SNN impacts solution quality and discuss approaches to improve it.
more »
« less
DFSynthesizer: Dataflow-based Synthesis of Spiking Neural Networks to Neuromorphic Hardware
Spiking Neural Networks (SNNs) are an emerging computation model that uses event-driven activation and bio-inspired learning algorithms. SNN-based machine learning programs are typically executed on tile-based neuromorphic hardware platforms, where each tile consists of a computation unit called a crossbar, which maps neurons and synapses of the program. However, synthesizing such programs on an off-the-shelf neuromorphic hardware is challenging. This is because of the inherent resource and latency limitations of the hardware, which impact both model performance, e.g., accuracy, and hardware performance, e.g., throughput. We propose DFSynthesizer, an end-to-end framework for synthesizing SNN-based machine learning programs to neuromorphic hardware. The proposed framework works in four steps. First, it analyzes a machine learning program and generates SNN workload using representative data. Second, it partitions the SNN workload and generates clusters that fit on crossbars of the target neuromorphic hardware. Third, it exploits the rich semantics of the Synchronous Dataflow Graph (SDFG) to represent a clustered SNN program, allowing for performance analysis in terms of key hardware constraints such as number of crossbars, dimension of each crossbar, buffer space on tiles, and tile communication bandwidth. Finally, it uses a novel scheduling algorithm to execute clusters on crossbars of the hardware, guaranteeing hardware performance. We evaluate DFSynthesizer with 10 commonly used machine learning programs. Our results demonstrate that DFSynthesizer provides a much tighter performance guarantee compared to current mapping approaches.
more »
« less
- PAR ID:
- 10357878
- Date Published:
- Journal Name:
- ACM Transactions on Embedded Computing Systems
- Volume:
- 21
- Issue:
- 3
- ISSN:
- 1539-9087
- Page Range / eLocation ID:
- 1 to 35
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper explores the synergistic potential of neuromorphic and edge computing to create a versatile machine learning (ML) system tailored for processing data captured by dynamic vision sensors. We construct and train hybrid models, blending spiking neural networks (SNNs) and artificial neural networks (ANNs) using PyTorch and Lava frameworks. Our hybrid architecture integrates an SNN for temporal feature extraction and an ANN for classification. We delve into the challenges of deploying such hybrid structures on hardware. Specifically, we deploy individual components on Intel's Neuromorphic Processor Loihi (for SNN) and Jetson Nano (for ANN). We also propose an accumulator circuit to transfer data from the spiking to the non-spiking domain. Furthermore, we conduct comprehensive performance analyses of hybrid SNN-ANN models on a heterogeneous system of neuromorphic and edge AI hardware, evaluating accuracy, latency, power, and energy consumption. Our findings demonstrate that the hybrid spiking networks surpass the baseline ANN model across all metrics and outperform the baseline SNN model in accuracy and latency.more » « less
-
Memristors have recently received significant attention as device-level components for building a novel generation of computing systems. These devices have many promising features, such as non-volatility, low power consumption, high density, and excellent scalability. The ability to control and modify biasing voltages at memristor terminals make them promising candidates to efficiently perform matrix-vector multiplications and solve systems of linear equations. In this article, we discuss how networks of memristors arranged in crossbar arrays can be used for efficiently solving optimization and machine learning problems. We introduce a new memristor-based optimization framework that combines the computational merits of memristor crossbars with the advantages of an operator splitting method, the alternating direction method of multipliers (ADMM). Here, ADMM helps in splitting a complex optimization problem into subproblems that involve the solution of systems of linear equations. The strength of this framework is shown by applying it to linear programming, quadratic programming, and sparse optimization. In addition to ADMM, implementation of a customized power iteration method for eigenvalue/eigenvector computation using memristor crossbars is discussed. The memristor-based power iteration method can further be applied to principal component analysis. The use of memristor crossbars yields a significant speed-up in computation, and thus, we believe, has the potential to advance optimization and machine learning research in artificial intelligence.more » « less
-
Abstract Recent studies of resistive switching devices with hexagonal boron nitride (h-BN) as the switching layer have shown the potential of two-dimensional (2D) materials for memory and neuromorphic computing applications. The use of 2D materials allows scaling the resistive switching layer thickness to sub-nanometer dimensions enabling devices to operate with low switching voltages and high programming speeds, offering large improvements in efficiency and performance as well as ultra-dense integration. These characteristics are of interest for the implementation of neuromorphic computing and machine learning hardware based on memristor crossbars. However, existing demonstrations of h-BN memristors focus on single isolated device switching properties and lack attention to fundamental machine learning functions. This paper demonstrates the hardware implementation of dot product operations, a basic analog function ubiquitous in machine learning, using h-BN memristor arrays. Moreover, we demonstrate the hardware implementation of a linear regression algorithm on h-BN memristor arrays.more » « less
-
Wang, Huan (Ed.)Defect identification has been a significant task in various fields to prevent the potential problems caused by imperfection. There is great attention for developing technology to accurately extract defect information from the image using a computing system without human error. However, image analysis using conventional computing technology based on Von Neumann structure is facing bottlenecks to efficiently process the huge volume of input data at low power and high speed. Herein efficient defect identification is demonstrated via a morphological image process with minimal power consumption using an oxide transistor and a memristor‐based crossbar array that can be applied to neuromorphic computing. Using a hardware and software codesigned neuromorphic system combined with a dynamic Gaussian blur kernel operation, an enhanced defect detection performance is successfully demonstrated with about 104 times more power‐efficient computation compared to the conventional complementary metal‐oxide semiconductor (CMOS)‐based digital implementation. It is believed the back end of line (BEOL)‐compatible all‐oxide‐based memristive crossbar array provides the unique potential toward universal artificial intelligence of things (AIoT) applications where conventional hardware can hardly be used.more » « less