Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other uses spiking neural networks (SNNs) to emulate biological neurons in an attempt to be as efficient and fault-tolerant as the brain. While there has been considerable progress in the former area due to a combination of effective training algorithms and acceleration platforms, the latter is still in its infancy due to the lack of both. SNNs have a distinct advantage over their ANN counterparts in that they are capable of operating in an event-driven manner, thus consuming very low power. Several recent efforts have proposed various SNN hardware design alternatives, however, these designs still incur considerable energy overheads.In this context, this paper proposes a comprehensive design spanning across the device, circuit, architecture and algorithm levels to build an ultra low-power architecture for SNN and ANN inference. For this, we use spintronics-based magnetic tunnel junction (MTJ) devices that have been shown to function as both neuro-synaptic crossbars as well as thresholding neurons and can operate at ultra low voltage and current levels. Using this MTJ-based neuron model and synaptic connections, we design a low power chipmore »
Semi-supervised learning and inference in domain-wall magnetic tunnel junction (DW-MTJ) neural networks
Advances in machine intelligence have sparked interest in hardware accelerators to implement these algorithms, yet embedded electronics have stringent power, area budgets, and speed requirements that may limit nonvolatile memory (NVM) integration. In this context, the development of fast nanomagnetic neural networks using minimal training data is attractive. Here, we extend an inference-only proposal using the intrinsic physics of domain-wall MTJ (DW-MTJ) neurons for online learning to implement fully unsupervised pattern recognition operation, using winner-take-all networks that contain either random or plastic synapses (weights). Meanwhile, a read-out layer trains in a supervised fashion. We find our proposed design can approach state-of-the-art success on the task relative to competing memristive neural network proposals, while eliminating much of the area and energy overhead that would typically be required to build the neuronal layers with CMOS devices.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- SPIE Spintronics XII
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
Flooding is one of the leading threats of natural disasters to human life and property, especially in densely populated urban areas. Rapid and precise extraction of the flooded areas is key to supporting emergency-response planning and providing damage assessment in both spatial and temporal measurements. Unmanned Aerial Vehicles (UAV) technology has recently been recognized as an efficient photogrammetry data acquisition platform to quickly deliver high-resolution imagery because of its cost-effectiveness, ability to fly at lower altitudes, and ability to enter a hazardous area. Different image classification methods including SVM (Support Vector Machine) have been used for flood extent mapping. In recent years, there has been a significant improvement in remote sensing image classification using Convolutional Neural Networks (CNNs). CNNs have demonstrated excellent performance on various tasks including image classification, feature extraction, and segmentation. CNNs can learn features automatically from large datasets through the organization of multi-layers of neurons and have the ability to implement nonlinear decision functions. This study investigates the potential of CNN approaches to extract flooded areas from UAV imagery. A VGG-based fully convolutional network (FCN-16s) was used in this research. The model was fine-tuned and a k-fold cross-validation was applied to estimate the performance of the modelmore »
Short-Term Long-Term Compute-In-Memory Architecture: A Hybrid Spin/CMOS Approach Supporting Intrinsic ConsolidationBiological memory structures impart enormous retention capacity while automatically providing vital functions for chronological information management and update resolution of domain and episodic knowledge. A crucial requirement for hardware realization of such cortical operations found in biology is to first design both Short-Term Memory (STM) and Long-Term Memory (LTM). Herein, these memory features are realized via a beyond-CMOS based learning approach derived from the repeated input information and retrieval of the encoded data. We first propose a new binary STM-LTM architecture with composite synapse of Spin Hall Effect-driven Magnetic Tunnel Junction (SHE-MTJ) and capacitive memory bit-cell to mimic the behavior of biological synapses. This STM-LTM platform realizes the memory potentiation through a continual update process using STM-to-LTM transfer, which is applied to Neural Networks based on the established capacitive crossbar. We then propose a hardware-enabled and customized STM-LTM transition algorithm for the platform considering the real hardware parameters. We validate the functionality of the design using SPICE simulations that show the proposed synapse has the potential of reaching ~30.2pJ energy consumption for STM-to-LTM transfer and 65pJ during STM programming. We further analyze the correlation between energy, array size, and STM-to-LTM threshold utilizing the MNIST dataset.
With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: 1) the different dimensions within same-type layers, 2) the different convolution layers especially transposed and dilated convolutions, and 3) CNN’s complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully-pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3 × performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98 × and 13.42 × for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leadsmore »
Voltage-Controlled Energy-Efficient Domain Wall Synapses With Stochastic Distribution of Quantized Weights in the Presence of Thermal Noise and Edge RoughnessWe propose energy-efficient voltage-induced strain control of a domain wall (DW) in a perpendicularly magnetized nanoscale racetrack on a piezoelectric substrate that can implement a multistate synapse to be utilized in neuromorphic computing platforms. Here, strain generated in the piezoelectric is mechanically transferred to the racetrack and modulates the perpendicular magnetic anisotropy (PMA) in a system that has significant interfacial Dzyaloshinskii-Moriya interaction (DMI). When different voltages are applied (i.e., different strains are generated) in conjunction with spin-orbit torque (SOT) due to a fixed current flowing in the heavy metal layer for a fixed time, DWs are translated to different distances and implement different synaptic weights. We have shown using micromagnetic simulations that five-state and three-state synapses can be implemented in a racetrack that is modeled with the inclusion of natural edge roughness and room temperature thermal noise. These simulations show interesting dynamics of DWs due to interaction with roughness-induced pinning sites. Thus, notches need not be fabricated to implement multistate nonvolatile synapses. Such a strain-controlled synapse has an energy consumption of ~1 fJ and could thus be very attractive to implement energy-efficient quantized neural networks, which has been shown recently to achieve near equivalent classification accuracy to the full-precision neuralmore »