skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Quantized non-volatile nanomagnetic domain wall synapse based autoencoder for efficient unsupervised network anomaly detection
Abstract Anomaly detection in real-time using autoencoders implemented on edge devices is exceedingly challenging due to limited hardware, energy, and computational resources. We show that these limitations can be addressed by designing an autoencoder with low-resolution non-volatile memory-based synapses and employing an effective quantized neural network learning algorithm. We further propose nanoscale ferromagnetic racetracks with engineered notches hosting magnetic domain walls (DW) as exemplary non-volatile memory-based autoencoder synapses, where limited state (5-state) synaptic weights are manipulated by spin orbit torque (SOT) current pulses to write different magnetoresistance states. The performance of anomaly detection of the proposed autoencoder model is evaluated on the NSL-KDD dataset. Limited resolution and DW device stochasticity aware training of the autoencoder is performed, which yields comparable anomaly detection performance to the autoencoder having floating-point precision weights. While the limited number of quantized states and the inherent stochastic nature of DW synaptic weights in nanoscale devices are typically known to negatively impact the performance, our hardware-aware training algorithm is shown to leverage these imperfect device characteristics to generate an improvement in anomaly detection accuracy (90.98%) compared to accuracy obtained with floating-point synaptic weights that are extremely memory intensive. Furthermore, our DW-based approach demonstrates a remarkable reduction of at least three orders of magnitude in weight updates during training compared to the floating-point approach, implying significant reduction in operation energy for our method. This work could stimulate the development of extremely energy efficient non-volatile multi-state synapse-based processors that can perform real-time training and inference on the edge with unsupervised data.  more » « less
Award ID(s):
1954589
PAR ID:
10540660
Author(s) / Creator(s):
; ;
Publisher / Repository:
IOPscience
Date Published:
Journal Name:
Neuromorphic Computing and Engineering
Volume:
4
Issue:
2
ISSN:
2634-4386
Page Range / eLocation ID:
024012
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We propose energy-efficient voltage-induced strain control of a domain wall (DW) in a perpendicularly magnetized nanoscale racetrack on a piezoelectric substrate that can implement a multistate synapse to be utilized in neuromorphic computing platforms. Here, strain generated in the piezoelectric is mechanically transferred to the racetrack and modulates the perpendicular magnetic anisotropy (PMA) in a system that has significant interfacial Dzyaloshinskii-Moriya interaction (DMI). When different voltages are applied (i.e., different strains are generated) in conjunction with spin-orbit torque (SOT) due to a fixed current flowing in the heavy metal layer for a fixed time, DWs are translated to different distances and implement different synaptic weights. We have shown using micromagnetic simulations that five-state and three-state synapses can be implemented in a racetrack that is modeled with the inclusion of natural edge roughness and room temperature thermal noise. These simulations show interesting dynamics of DWs due to interaction with roughness-induced pinning sites. Thus, notches need not be fabricated to implement multistate nonvolatile synapses. Such a strain-controlled synapse has an energy consumption of ~1 fJ and could thus be very attractive to implement energy-efficient quantized neural networks, which has been shown recently to achieve near equivalent classification accuracy to the full-precision neural networks. 
    more » « less
  2. Abstract Artificial neural networks (ANNs) are widely used in numerous artificial intelligence‐based applications. However, the significant amount of data transferred between computing units and storage has limited the widespread deployment of ANN for the artificial intelligence of things (AIoT) and power‐constrained device applications. Therefore, among various ANN algorithms, quantized neural networks (QNNs) have garnered considerable attention because they require fewer computational resources with minimal energy consumption. Herein, an oxide‐based ternary charge‐trap transistor (CTT) that provides three discrete states and non‐volatile memory characteristics are introduced, which are desirable for QNN computing. By employing a differential pair of ternary CTTs, an artificial synaptic segregation with multilevel quantized values for QNNs is demostrated. The approach establishes a platform that combines the advantages of multiple states and robustness to noise for in‐memory computing to achieve reliable QNN performance in hardware, thereby facilitating the development of energy‐efficient AIoT. 
    more » « less
  3. Generative Adversarial Network (GAN) has emerged as one of the most promising semi-supervised learning methods where two neural nets train themselves in a competitive environment. In this paper, as far as we know, we are the first to present a statistically trained Ternarized Generative Adversarial Network (TGAN) with fully ternarized weights (i.e. -1,0,+1) to massively reduce the need for computation and storage resources in the conventional GAN structures. In the proposed TGAN, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) in both generator and discriminator's forward path are converted into hardware-friendly Addition/Subtraction operations. Accordingly, we propose a Processing-in-Memory accelerator for TGAN called (PIM-TGAN) based on Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) computational sub-arrays to efficiently accelerate the training process of GAN within non-volatile memory. In addition, we propose a parallelism technique to further enhance the training efficiency of TGAN. Our device-to-architecture co-simulation results show that, with almost the same inception score to the baseline GAN with floating point number weights on different data-sets, the proposed PIM-TGAN can obtain ~25.6× better energy-efficiency and 22× speedup compared to GPU platform averagely, and, 9.2× better energy-efficiency and 5.4× speedup over the best processing-in-ReRAM accelerators. 
    more » « less
  4. Spiking neural networks (SNNs) have received increasing attention due to their high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation in event-based and static computer vision applications. However, the weight precision and especially the membrane potential precision remain as high-precision values (e.g., 32 bits) in state-of-the-art SNN algorithms. Each neuron in an SNN stores the membrane potential over time and typically updates its value in every time step. Such frequent read/write operations of high-precision membrane potential incur storage and memory access overhead in SNNs, which undermines the SNNs' compatibility with resource-constrained hardware. To resolve this inefficiency, prior works have explored the time step reduction and low-precision representation of membrane potential at a limited scale and reported significant accuracy drops. Furthermore, while recent advances in on-device AI present pruning and quantization optimization with different architectures and datasets, simultaneous pruning with quantization is highly under-explored in SNNs. In this work, we present SpQuant-SNN, a fully-quantized spiking neural network with ultra-low precision weights, membrane potential, and high spatial-channel sparsity, enabling the end-to-end low precision with significantly reduced operations on SNN. First, we propose an integer-only quantization scheme for the membrane potential with a stacked surrogate gradient function, a simple-yet-effective method that enables the smooth learning process of quantized SNN training. Second, we implement spatial-channel pruning with membrane potential prior, toward reducing the layer-wise computational complexity, and floating-point operations (FLOPs) in SNNs. Finally, to further improve the accuracy of low-precision and sparse SNN, we propose a self-adaptive learnable potential threshold for SNN training. Equipped with high biological adaptiveness, minimal computations, and memory utilization, SpQuant-SNN achieves state-of-the-art performance across multiple SNN models for both event-based and static image datasets, including both image classification and object detection tasks. The proposed SpQuant-SNN achieved up to 13× memory reduction and >4.7× FLOPs reduction with ~1.8% accuracy degradation for both classification and object detection tasks, compared to the SOTA baseline. 
    more » « less
  5. We present LBW-Net, an efficient optimization based method for quantization and training of the low bit-width convolutional neural networks (CNNs). Specifically, we quantize the weights to zero or powers of 2 by minimizing the Euclidean distance between full-precision weights and quantized weights during back-propagation (weight learning). We characterize the combinatorial nature of the low bit-width quantization problem. For 2-bit (ternary) CNNs, the quantization of N weights can be done by an exact formula in O(N log N) complexity. When the bit-width is 3 and above, we further propose a semi-analytical thresholding scheme with a single free parameter for quantization that is computationally inexpensive. The free parameter is further determined by network retraining and object detection tests. The LBW-Net has several desirable advantages over full-precision CNNs, including considerable memory savings, energy efficiency, and faster deployment. Our experiments on PASCAL VOC dataset show that compared with its 32-bit floating-point counterpart, the performance of the 6-bit LBW-Net is nearly lossless in the object detection tasks, and can even do better in real world visual scenes, while empirically enjoying more than 4× faster deployment. 
    more » « less