skip to main content

Title: Semi-supervised learning and inference in domain-wall magnetic tunnel junction (DW-MTJ) neural networks
Advances in machine intelligence have sparked interest in hardware accelerators to implement these algorithms, yet embedded electronics have stringent power, area budgets, and speed requirements that may limit nonvolatile memory (NVM) integration. In this context, the development of fast nanomagnetic neural networks using minimal training data is attractive. Here, we extend an inference-only proposal using the intrinsic physics of domain-wall MTJ (DW-MTJ) neurons for online learning to implement fully unsupervised pattern recognition operation, using winner-take-all networks that contain either random or plastic synapses (weights). Meanwhile, a read-out layer trains in a supervised fashion. We find our proposed design can approach state-of-the-art success on the task relative to competing memristive neural network proposals, while eliminating much of the area and energy overhead that would typically be required to build the neuronal layers with CMOS devices.
Authors:
; ; ; ; ;
Award ID(s):
1910800
Publication Date:
NSF-PAR ID:
10145308
Journal Name:
SPIE Spintronics XII
Volume:
11090
Page Range or eLocation-ID:
110903I
Sponsoring Org:
National Science Foundation
More Like this
  1. Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other uses spiking neural networks (SNNs) to emulate biological neurons in an attempt to be as efficient and fault-tolerant as the brain. While there has been considerable progress in the former area due to a combination of effective training algorithms and acceleration platforms, the latter is still in its infancy due to the lack of both. SNNs have a distinct advantage over their ANN counterparts in that they are capable of operating in an event-driven manner, thus consuming very low power. Several recent efforts have proposed various SNN hardware design alternatives, however, these designs still incur considerable energy overheads.In this context, this paper proposes a comprehensive design spanning across the device, circuit, architecture and algorithm levels to build an ultra low-power architecture for SNN and ANN inference. For this, we use spintronics-based magnetic tunnel junction (MTJ) devices that have been shown to function as both neuro-synaptic crossbars as well as thresholding neurons and can operate at ultra low voltage and current levels. Using this MTJ-based neuron model and synaptic connections, we design a low power chipmore »that has the flexibility to be deployed for inference of SNNs, ANNs as well as a combination of SNN-ANN hybrid networks - a distinct advantage compared to prior works. We demonstrate the competitive performance and energy efficiency of the SNNs as well as hybrid models on a suite of workloads. Our evaluations show that the proposed design, NEBULA, is up to 7.9× more energy efficient than a state-of-the-art design, ISAAC, in the ANN mode. In the SNN mode, our design is about 45× more energy-efficient than a contemporary SNN architecture, INXS. Power comparison between NEBULA ANN and SNN modes indicates that the latter is at least 6.25× more power-efficient for the observed benchmarks.« less
  2. Flooding is one of the leading threats of natural disasters to human life and property, especially in densely populated urban areas. Rapid and precise extraction of the flooded areas is key to supporting emergency-response planning and providing damage assessment in both spatial and temporal measurements. Unmanned Aerial Vehicles (UAV) technology has recently been recognized as an efficient photogrammetry data acquisition platform to quickly deliver high-resolution imagery because of its cost-effectiveness, ability to fly at lower altitudes, and ability to enter a hazardous area. Different image classification methods including SVM (Support Vector Machine) have been used for flood extent mapping. In recent years, there has been a significant improvement in remote sensing image classification using Convolutional Neural Networks (CNNs). CNNs have demonstrated excellent performance on various tasks including image classification, feature extraction, and segmentation. CNNs can learn features automatically from large datasets through the organization of multi-layers of neurons and have the ability to implement nonlinear decision functions. This study investigates the potential of CNN approaches to extract flooded areas from UAV imagery. A VGG-based fully convolutional network (FCN-16s) was used in this research. The model was fine-tuned and a k-fold cross-validation was applied to estimate the performance of the modelmore »on the new UAV imagery dataset. This approach allowed FCN-16s to be trained on the datasets that contained only one hundred training samples, and resulted in a highly accurate classification. Confusion matrix was calculated to estimate the accuracy of the proposed method. The image segmentation results obtained from FCN-16s were compared from the results obtained from FCN-8s, FCN-32s and SVMs. Experimental results showed that the FCNs could extract flooded areas precisely from UAV images compared to the traditional classifiers such as SVMs. The classification accuracy achieved by FCN-16s, FCN-8s, FCN-32s, and SVM for the water class was 97.52%, 97.8%, 94.20% and 89%, respectively.« less
  3. Biological memory structures impart enormous retention capacity while automatically providing vital functions for chronological information management and update resolution of domain and episodic knowledge. A crucial requirement for hardware realization of such cortical operations found in biology is to first design both Short-Term Memory (STM) and Long-Term Memory (LTM). Herein, these memory features are realized via a beyond-CMOS based learning approach derived from the repeated input information and retrieval of the encoded data. We first propose a new binary STM-LTM architecture with composite synapse of Spin Hall Effect-driven Magnetic Tunnel Junction (SHE-MTJ) and capacitive memory bit-cell to mimic the behavior of biological synapses. This STM-LTM platform realizes the memory potentiation through a continual update process using STM-to-LTM transfer, which is applied to Neural Networks based on the established capacitive crossbar. We then propose a hardware-enabled and customized STM-LTM transition algorithm for the platform considering the real hardware parameters. We validate the functionality of the design using SPICE simulations that show the proposed synapse has the potential of reaching ~30.2pJ energy consumption for STM-to-LTM transfer and 65pJ during STM programming. We further analyze the correlation between energy, array size, and STM-to-LTM threshold utilizing the MNIST dataset.
  4. With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: 1) the different dimensions within same-type layers, 2) the different convolution layers especially transposed and dilated convolutions, and 3) CNN’s complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully-pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3 × performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98 × and 13.42 × for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leadsmore »to a 5 × speedup for OpenPose.« less
  5. We propose energy-efficient voltage-induced strain control of a domain wall (DW) in a perpendicularly magnetized nanoscale racetrack on a piezoelectric substrate that can implement a multistate synapse to be utilized in neuromorphic computing platforms. Here, strain generated in the piezoelectric is mechanically transferred to the racetrack and modulates the perpendicular magnetic anisotropy (PMA) in a system that has significant interfacial Dzyaloshinskii-Moriya interaction (DMI). When different voltages are applied (i.e., different strains are generated) in conjunction with spin-orbit torque (SOT) due to a fixed current flowing in the heavy metal layer for a fixed time, DWs are translated to different distances and implement different synaptic weights. We have shown using micromagnetic simulations that five-state and three-state synapses can be implemented in a racetrack that is modeled with the inclusion of natural edge roughness and room temperature thermal noise. These simulations show interesting dynamics of DWs due to interaction with roughness-induced pinning sites. Thus, notches need not be fabricated to implement multistate nonvolatile synapses. Such a strain-controlled synapse has an energy consumption of ~1 fJ and could thus be very attractive to implement energy-efficient quantized neural networks, which has been shown recently to achieve near equivalent classification accuracy to the full-precision neuralmore »networks.« less