skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: MnM: A Fast and Efficient Min/Max Searching in MRAM
In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra's algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM.  more » « less
Award ID(s):
2003749 2144751
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Great Lakes Symposium on VLSI
Page Range / eLocation ID:
39 to 44
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Due to the separate memory and computation units in traditional Von-Neumann architecture, massive data transfer dominates the overall computing system’s power and latency, known as the ‘Memory-Wall’ issue. Especially with ever-increasing deep learning-based AI model size and computing complexity, it becomes the bottleneck for state-of-the-art AI computing systems. To address this challenge, In-Memory Computing (IMC) based Neural Network accelerators have been widely investigated to support AI computing within memory. However, most of those works focus only on inference. The on-device training and continual learning have not been well explored yet. In this work, for the first time, we introduce on-device continual learning with STT-assisted-SOT (SAS) Magnetic Random Access Memory (MRAM) based IMC system. On the hardware side, we have fabricated a SAS-MRAM device prototype with 4 Magnetic Tunnel Junctions (MTJ, each at 100nm × 50nm) sharing a common heavy metal layer, achieving significantly improved memory writing and area efficiency compared to traditional SOT-MRAM. Next, we designed fully digital IMC circuits with our SAS-MRAM to support both neural network inference and on-device learning. To enable efficient on-device continual learning for new task data, we present an 8-bit integer (INT8) based continual learning algorithm that utilizes our SAS-MRAM IMC-supported bit-serial digital in-memory convolution operations to train a small parallel reprogramming Network (Rep-Net) while freezing the major backbone model. Extensive studies have been presented based on our fabricated SAS-MRAM device prototype, cross-layer device-circuit benchmarking and simulation, as well as the on-device continual learning system evaluation. 
    more » « less
  2. Recently, in-DRAM computing is becoming one promising technique to address the notorious ‘memory-wall’ issue for big data processing. In this work, for the first time, we propose a novel ‘Min/Max-in-memory’ algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum of bulk data stored in DRAM as unsigned & signed integers, fixed-point and floating numbers. We then develop a new processing-in-DRAM architecture, called Max-PIM, that supports complete bit-wise Boolean logic and beyond. Differentiating from prior works, Max-PIM is optimized with one-cycle fast XNOR logicin-DRAM operation and in-memory data transpose, which are heavily used and keys to accelerate the proposed Min/Max-in-memory algorithm efficiently. Extensive experiments of utilizing Max-PIM in big data sorting and graph processing applications show that it could speed up ~ 50X and ~ 1000X than GPU and CPU, while only consuming 10% and 1% energy, respectively. Moreover, comparing with recent representative In-DRAM computing platforms, i.e., Ambit [1], DRISA [2], our design could speed up ~ 3X - 10X. 
    more » « less
  3. In this paper, we propose a novel Spin-Transfer Torque Magnetic Random-Access Memory (STT-MRAM) array design that could simultaneously work as non-volatile memory and implement a reconfigure in-memory logic operation without add-on logic circuits to the memory chip. The computed output could be simply read out like a typical MRAM bit-cell through the modified peripheral circuit. Such intrinsic in-memory computation can be used to process data locally and transfers the “cooked” data to the primary processing unit (i.e. CPU or GPU) for complex computation with high precision requirement. It greatly reduces power-hungry and long distance data communication, and further leads to extreme parallelism within memory. In this work, we further propose an in-memory edge extraction algorithm as a case study to demonstrate the efficiency of in memory preprocessing methodology. The simulation results show that our edge extraction method reduces data communication as much as 8x for grayscale image, thus greatly reducing system energy consumption. Meanwhile, the F-measure result shows only ∼10% degradation compared to conventional edge detection operator, such as Prewitt, Sobel and Roberts. 
    more » « less
  4. In this paper, we explore potentials of leveraging spin-based in-memory computing platform as an accelerator for Binary Convolutional Neural Networks (BCNN). Such platform can implement the dominant convolution computation based on presented Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array. The proposed array architecture could simultaneously work as non-volatile memory and a reconfigurable in-memory logic (AND, OR) without add-on logic circuits to memory chip as in conventional logic-in-memory designs. The computed logic output could be also simply read out like a normal MRAM bit-cell using the shared memory peripheral circuits. We employ such intrinsic in-memory computing architecture to efficiently process data within memory to greatly reduce power hungry and omit long distance data communication concerning state-of-the-art BCNN hardware. 
    more » « less
  5. In this paper, we propose an energy-efficient reconfigurable platform for in-memory processing based on novel 4-terminal spin Hall effect-driven domain wall motion devices that could be employed as both non-volatile memory cell and in-memory logic unit. The proposed designs lead to unity of memory and logic. The device to system level simulation results show that, with 28% area increase in memory structure, the proposed in-memory processing platform achieves a write energy ~15.6 fJ/bit with 79% reduction compared to that of SOT-MRAM counterpart while keeping the identical 1ns writing speed. In addition, the proposed in-memory logic scheme improves the operating energy by 61.3%, as compared with the recent non-volatile in-memory logic designs. An extensive reliability analysis is also performed over the proposed circuits. We employ Advanced Encryption Standard (AES) algorithm as a case study to elucidate the efficiency of the proposed platform at application level. Simulation results exhibit that the proposed platform can show up to 75.7% and 30.4% lower energy consumption compared to CMOS-ASIC and recent pipelined domain wall (DW) AES implementations, respectively. In addition, the AES Energy-Delay Product (EDP) can show 15.1% and 6.1% improvements compared to the DW-AES and CMOS-ASIC implementations, respectively. 
    more » « less