The spatiotemporal nature of neuronal behavior in spiking neural networks (SNNs) makes SNNs promising for edge applications that require high energy efficiency. To realize SNNs in hardware, spintronic neuron implementations can bring advantages of scalability and energy efficiency. Domain wall (DW)-based magnetic tunnel junction (MTJ) devices are well suited for probabilistic neural networks given their intrinsic integrate-and-fire behavior with tunable stochasticity. Here, we present a scaled DW-MTJ neuron with voltage-dependent firing probability. The measured behavior was used to simulate a SNN that attains accuracy during learning compared to an equivalent, but more complicated, multi-weight DW-MTJ device. The validation accuracy during training was also shown to be comparable to an ideal leaky integrate and fire device. However, during inference, the binary DW-MTJ neuron outperformed the other devices after Gaussian noise was introduced to the Fashion-MNIST classification task. This work shows that DW-MTJ devices can be used to construct noise-resilient networks suitable for neuromorphic computing on the edge.
more »
« less
Semi-supervised learning and inference in domain-wall magnetic tunnel junction (DW-MTJ) neural networks
Advances in machine intelligence have sparked interest in hardware accelerators to implement these algorithms, yet embedded electronics have stringent power, area budgets, and speed requirements that may limit nonvolatile memory (NVM) integration. In this context, the development of fast nanomagnetic neural networks using minimal training data is attractive. Here, we extend an inference-only proposal using the intrinsic physics of domain-wall MTJ (DW-MTJ) neurons for online learning to implement fully unsupervised pattern recognition operation, using winner-take-all networks that contain either random or plastic synapses (weights). Meanwhile, a read-out layer trains in a supervised fashion. We find our proposed design can approach state-of-the-art success on the task relative to competing memristive neural network proposals, while eliminating much of the area and energy overhead that would typically be required to build the neuronal layers with CMOS devices.
more »
« less
- Award ID(s):
- 1910800
- PAR ID:
- 10145308
- Date Published:
- Journal Name:
- SPIE Spintronics XII
- Volume:
- 11090
- Page Range / eLocation ID:
- 110903I
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other uses spiking neural networks (SNNs) to emulate biological neurons in an attempt to be as efficient and fault-tolerant as the brain. While there has been considerable progress in the former area due to a combination of effective training algorithms and acceleration platforms, the latter is still in its infancy due to the lack of both. SNNs have a distinct advantage over their ANN counterparts in that they are capable of operating in an event-driven manner, thus consuming very low power. Several recent efforts have proposed various SNN hardware design alternatives, however, these designs still incur considerable energy overheads.In this context, this paper proposes a comprehensive design spanning across the device, circuit, architecture and algorithm levels to build an ultra low-power architecture for SNN and ANN inference. For this, we use spintronics-based magnetic tunnel junction (MTJ) devices that have been shown to function as both neuro-synaptic crossbars as well as thresholding neurons and can operate at ultra low voltage and current levels. Using this MTJ-based neuron model and synaptic connections, we design a low power chip that has the flexibility to be deployed for inference of SNNs, ANNs as well as a combination of SNN-ANN hybrid networks - a distinct advantage compared to prior works. We demonstrate the competitive performance and energy efficiency of the SNNs as well as hybrid models on a suite of workloads. Our evaluations show that the proposed design, NEBULA, is up to 7.9× more energy efficient than a state-of-the-art design, ISAAC, in the ANN mode. In the SNN mode, our design is about 45× more energy-efficient than a contemporary SNN architecture, INXS. Power comparison between NEBULA ANN and SNN modes indicates that the latter is at least 6.25× more power-efficient for the observed benchmarks.more » « less
-
Using binarized neural network (BNN) as an alternative to the conventional convolutional neural network is a promising candidate to answer the demand of using human brain-inspired in applications with limited hardware and power resources, such as biomedical devices, IoT edge sensors, and other battery-operated devices. Using nonvolatile memory elements like MTJ devices in a LiM-based architecture can eliminate the need to access and use external memory which can significantly reduce the power consumption and area overhead. In addition, by using adiabatic-based designs, a significant part of the consumed power can be recovered to the power source which leads to a huge reduction in power consumption which is vital in applications with limited power and hardware resources. In this paper by using nonvolatile MTJ devices in a LiM architecture and using adiabatic-based circuits, an XNOR/XOR synapse and neuron is proposed. The proposed design offers 97% improvement in comparison with its state-of-the-art counterparts in case of power consumption. Also, it achieves at least 7% lower area compared to other counterparts which makes the proposed design a promising candidate for hardware implementation of BNNs.more » « less
-
In this paper, we develop a 6-input fracturable non-volatile Clockless LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-based C-LUT. The proposed C-LUT offers an attractive alternative for implementing combinational logic as well as sequential logic versus previous spin-based LUT designs in the literature. Foremost, C-LUT eliminates the sense amplifier typically employed by using a differential polarity dual MTJ design, as opposed to a static reference resistance MTJ. This realizes a much wider read margin and the Monte Carlo simulation of the proposed fracturable C-LUT indicates no read and write errors in the presence of a variety of process variations scenarios involving MOS transistors as well as MTJs. Additionally, simulation results indicate that the proposed C-LUT reduces the standby power dissipation by $5.4$-fold compared to the SRAM-based LUT. Furthermore, the proposed SHE-MTJ-based C-LUT reduces the area by 1.3-fold and 2-fold compared to the SRAM-based LUT and the STT-MTJ-based C-LUT, respectively.more » « less
-
Accelerating machine learning inference has been an active research area in recent years. In this context, field-programmable gate arrays (FPGAs) have demonstrated compelling performance by providing massive parallelism in deep neural networks (DNNs). Neural networks (NNs) are computationally intensive during inference, as they require massive amounts of multiplication and addition, which makes their implementations costly. Numerous studies have recently addressed this challenge to some extent using a combination of sparsity induction, quantization, and transformation of neurons or sub-networks into lookup tables (LUTs) on FPGAs. Gradient boosted decision trees (GBDTs) are a high-accuracy alternative to DNNs in a wide range of regression and classification tasks, particularly for tabular datasets. The basic building block of GBDTs is a decision tree, which resembles the structure of binary decision diagrams. FPGA design flows are heavily optimized to implement such a structure efficiently. In addition to decision trees, GBDTs perform simple operations during inference, including comparison and addition. We present TreeLUT as an open-source tool for implementing GBDTs using an efficient quantization scheme, hardware architecture, and pipelining strategy. It primarily utilizes LUTs with no BRAMs or DSPs on FPGAs, resulting in high efficiency. We show the effectiveness of TreeLUT using multiple classification datasets, commonly used to evaluate ultra-low area and latency architectures. Using these benchmarks, we compare our implementation results with existing DNN and GBDT methods, such as DWN, PolyLUT-Add, NeuraLUT, LogicNets, FINN, hls4ml, and others. Our results show that TreeLUT significantly improves hardware utilization, latency, and throughput at competitive accuracy compared to previous works. For instance, it achieves an accuracy of around 97% on the MNIST dataset while delivering around 4 to 101 times lower hardware cost in terms of area-delay product than recent LUT-based NNs.more » « less
An official website of the United States government

