skip to main content

Title: ReDRAM: A Reconfigurable Processing-in-DRAM Platform for Accelerating Bulk Bit-Wise Operations
In this paper, we propose ReDRAM, as a reconfigurable DRAM-based processing-in-memory (PIM) accelerator, which transforms current DRAM architecture to massively parallel computational units exploiting the high internal bandwidth of modern memory chips. ReDRAM uses the analog operation of DRAM sub-arrays and elevates it to implement a full set of 1- and 2-input bulk bit-wise operations (NOT, (N)AND, (N)OR, and even X(N)OR) between operands stored in the same bit-line, based on a new dual-row activation mechanism with a modest change to peripheral circuits such sense amplifiers. ReDRAM can be leveraged to greatly reduce energy consumption and latency of complex in-DRAM logic computations relying on state-of-the-art mechanisms based on triple-row activation, dual-contact cells, row initialization, NOR style, etc. The extensive circuit-architecture simulations show that ReDRAM achieves on average 54× and 7.1× higher throughput for performing bulk bit-wise operations compared with CPU and GPU, respectively. Besides, ReDRAM outperforms recent processing-in-DRAM platforms with up to 3.7× better performance.  more » « less
Award ID(s):
2005209 1740126 2003749 1908495
Author(s) / Creator(s):
Date Published:
Journal Name:
2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Page Range / eLocation ID:
1 to 8
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we propose a Parallel Processing-In-DRAM architecture named P-PIM leveraging the high density of DRAM to enable fast and flexible computation. P-PIM enables bulk bit-wise in-DRAM logic between operands in the same bit-line by elevating the analog operation of the memory sub-array based on a novel dual-row activation mechanism. With this, P-PIM can opportunistically perform a complete and inexpensive in-DRAM RowHammer (RH) self-tracking and mitigation technique to protect the memory unit against such a challenging security vulnerability. Our results show that P-PIM achieves ~72% higher energy efficiency than the fastest charge-sharing-based designs. As for the RH protection, with a worst-case slowdown of ~0.8%, P-PIM archives up to 71% energy-saving over the SRAM/CAM-based frameworks and about 90% saving over DRAM-based frameworks. 
    more » « less
  2. null (Ed.)
    In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (~5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level's parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ~5× and ~ 7.5× compared to GPU. 
    more » « less
  3. Recently, in-DRAM computing is becoming one promising technique to address the notorious ‘memory-wall’ issue for big data processing. In this work, for the first time, we propose a novel ‘Min/Max-in-memory’ algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum of bulk data stored in DRAM as unsigned & signed integers, fixed-point and floating numbers. We then develop a new processing-in-DRAM architecture, called Max-PIM, that supports complete bit-wise Boolean logic and beyond. Differentiating from prior works, Max-PIM is optimized with one-cycle fast XNOR logicin-DRAM operation and in-memory data transpose, which are heavily used and keys to accelerate the proposed Min/Max-in-memory algorithm efficiently. Extensive experiments of utilizing Max-PIM in big data sorting and graph processing applications show that it could speed up ~ 50X and ~ 1000X than GPU and CPU, while only consuming 10% and 1% energy, respectively. Moreover, comparing with recent representative In-DRAM computing platforms, i.e., Ambit [1], DRISA [2], our design could speed up ~ 3X - 10X. 
    more » « less
  4. In this paper we propose a Highly Flexible InMemory (HieIM) computing platform using STT MRAM, which can be leveraged to implement Boolean logic functions without sacrificing memory functionality. It could pre-process data within memory to further reduce power hungry long distance communication between memory and processing units as in Von-Neumann computing system. HieIM can implement all the Boolean logic functions (AND/NAND, OR/NOR, XOR/XNOR) between any two cells in the same memory array, thus overcoming the `operand locality' problem in contemporary in-memory computing platform designs. To investigate the performance of HieIM, we test in-memory bulk bit-wise Boolean logic operations using different vector datasets, which shows ~ 8x energy saving and ~ 5x speedup compared to recent DRAM based in-memory computing platform. We further implement an in-memory data encryption engine design based on HieIM as another case study. With AES algorithm, it shows 51.5% and 68.9% lower energy consumption compared to CMOS-ASIC and CMOL based implementations, respectively. 
    more » « less
  5. Nowadays, research topics on AI accelerator designs have attracted great interest, where accelerating Deep Neural Network (DNN) using Processing-in-Memory (PIM) platforms is an actively-explored direction with great potential. PIM platforms, which simultaneously aims to address power- and memory-wall bottlenecks, have shown orders of performance enhancement in comparison to the conventional computing platforms with Von-Neumann architecture. As one direction of accelerating DNN in PIM, resistive memory array (aka. crossbar) has drawn great research interest owing to its analog current-mode weighted summation operation which intrinsically matches the dominant Multiplication-and-Accumulation (MAC) operation in DNN, making it one of the most promising candidates. An alternative direction for PIM-based DNN acceleration is through bulk bit-wise logic operations directly performed on the content in digital memories. Thanks to the high fault-tolerant characteristic of DNN, the latest algorithmic progression successfully quantized DNN parameters to low bit-width representations, while maintaining competitive accuracy levels. Such DNN quantization techniques essentially convert MAC operation to much simpler addition/subtraction or comparison operations, which can be performed by bulk bit-wise logic operations in a highly parallel fashion. In this paper, we build a comprehensive evaluation framework to quantitatively compare and analyze aforementioned PIM based analog and digital approaches for DNN acceleration. 
    more » « less