skip to main content

Title: Max-PIM: Fast and Efficient Max/Min Searching in DRAM
Recently, in-DRAM computing is becoming one promising technique to address the notorious ‘memory-wall’ issue for big data processing. In this work, for the first time, we propose a novel ‘Min/Max-in-memory’ algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum of bulk data stored in DRAM as unsigned & signed integers, fixed-point and floating numbers. We then develop a new processing-in-DRAM architecture, called Max-PIM, that supports complete bit-wise Boolean logic and beyond. Differentiating from prior works, Max-PIM is optimized with one-cycle fast XNOR logicin-DRAM operation and in-memory data transpose, which are heavily used and keys to accelerate the proposed Min/Max-in-memory algorithm efficiently. Extensive experiments of utilizing Max-PIM in big data sorting and graph processing applications show that it could speed up ~ 50X and ~ 1000X than GPU and CPU, while only consuming 10% and 1% energy, respectively. Moreover, comparing with recent representative In-DRAM computing platforms, i.e., Ambit [1], DRISA [2], our design could speed up ~ 3X - 10X.  more » « less
Award ID(s):
2005209 2003749
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2021 58th ACM/IEEE Design Automation Conference (DAC)
Page Range / eLocation ID:
211 to 216
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra's algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM. 
    more » « less
  2. null (Ed.)
    In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (~5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level's parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ~5× and ~ 7.5× compared to GPU. 
    more » « less
  3. In this paper we propose a Highly Flexible InMemory (HieIM) computing platform using STT MRAM, which can be leveraged to implement Boolean logic functions without sacrificing memory functionality. It could pre-process data within memory to further reduce power hungry long distance communication between memory and processing units as in Von-Neumann computing system. HieIM can implement all the Boolean logic functions (AND/NAND, OR/NOR, XOR/XNOR) between any two cells in the same memory array, thus overcoming the `operand locality' problem in contemporary in-memory computing platform designs. To investigate the performance of HieIM, we test in-memory bulk bit-wise Boolean logic operations using different vector datasets, which shows ~ 8x energy saving and ~ 5x speedup compared to recent DRAM based in-memory computing platform. We further implement an in-memory data encryption engine design based on HieIM as another case study. With AES algorithm, it shows 51.5% and 68.9% lower energy consumption compared to CMOS-ASIC and CMOL based implementations, respectively. 
    more » « less
  4. In this paper, we propose MRIMA, as a novel MRAM-based In-Memory Accelerator for non-volatile, flexible, and efficient in-memory computing. MRIMA transforms current Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) arrays to massively parallel computational units capable of working as both non-volatile memory and in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, MRIMA exploits hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a single clock cycle, overcoming the multi-cycle logic issue in contemporary Processing-In-Memory (PIM) platforms. We present practical case studies to demonstrate MRIMA’s acceleration for binary-weight and low bit-width Convolutional Neural Networks (CNN) as well as data encryption. Our device-to-architecture co-simulation results on CNN acceleration demonstrate that MRIMA can obtain 1.7× better energy-efficiency and 11.2× speed-up compared to ASICs, and, 1.8× better energy-efficiency and 2.4× speed-up over the best DRAM-based PIM solutions. As an AES in-memory encryption engine, MRIMA shows 77% and 21% lower energy consumption compared to CMOS-ASIC and recent domain wall-based design, respectively. 
    more » « less
  5. In this paper, we propose ReDRAM, as a reconfigurable DRAM-based processing-in-memory (PIM) accelerator, which transforms current DRAM architecture to massively parallel computational units exploiting the high internal bandwidth of modern memory chips. ReDRAM uses the analog operation of DRAM sub-arrays and elevates it to implement a full set of 1- and 2-input bulk bit-wise operations (NOT, (N)AND, (N)OR, and even X(N)OR) between operands stored in the same bit-line, based on a new dual-row activation mechanism with a modest change to peripheral circuits such sense amplifiers. ReDRAM can be leveraged to greatly reduce energy consumption and latency of complex in-DRAM logic computations relying on state-of-the-art mechanisms based on triple-row activation, dual-contact cells, row initialization, NOR style, etc. The extensive circuit-architecture simulations show that ReDRAM achieves on average 54× and 7.1× higher throughput for performing bulk bit-wise operations compared with CPU and GPU, respectively. Besides, ReDRAM outperforms recent processing-in-DRAM platforms with up to 3.7× better performance. 
    more » « less