skip to main content


Title: ReDRAM: A Reconfigurable Processing-in-DRAM Platform for Accelerating Bulk Bit-Wise Operations
In this paper, we propose ReDRAM, as a reconfigurable DRAM-based processing-in-memory (PIM) accelerator, which transforms current DRAM architecture to massively parallel computational units exploiting the high internal bandwidth of modern memory chips. ReDRAM uses the analog operation of DRAM sub-arrays and elevates it to implement a full set of 1- and 2-input bulk bit-wise operations (NOT, (N)AND, (N)OR, and even X(N)OR) between operands stored in the same bit-line, based on a new dual-row activation mechanism with a modest change to peripheral circuits such sense amplifiers. ReDRAM can be leveraged to greatly reduce energy consumption and latency of complex in-DRAM logic computations relying on state-of-the-art mechanisms based on triple-row activation, dual-contact cells, row initialization, NOR style, etc. The extensive circuit-architecture simulations show that ReDRAM achieves on average 54× and 7.1× higher throughput for performing bulk bit-wise operations compared with CPU and GPU, respectively. Besides, ReDRAM outperforms recent processing-in-DRAM platforms with up to 3.7× better performance.  more » « less
Award ID(s):
2005209 1740126 2003749 1908495
PAR ID:
10179704
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we propose a Parallel Processing-In-DRAM architecture named P-PIM leveraging the high density of DRAM to enable fast and flexible computation. P-PIM enables bulk bit-wise in-DRAM logic between operands in the same bit-line by elevating the analog operation of the memory sub-array based on a novel dual-row activation mechanism. With this, P-PIM can opportunistically perform a complete and inexpensive in-DRAM RowHammer (RH) self-tracking and mitigation technique to protect the memory unit against such a challenging security vulnerability. Our results show that P-PIM achieves ~72% higher energy efficiency than the fastest charge-sharing-based designs. As for the RH protection, with a worst-case slowdown of ~0.8%, P-PIM archives up to 71% energy-saving over the SRAM/CAM-based frameworks and about 90% saving over DRAM-based frameworks. 
    more » « less
  2. null (Ed.)
    In this paper, for the first time, we propose a high-throughput and energy-efficient Processing-in-DRAM-accelerated genome assembler called PIM-Assembler based on an optimized and hardware-friendly genome assembly algorithm. PIM-Assembler can assemble large-scale DNA sequence dataset from all-pair overlaps. We first develop PIM-Assembler platform that harnesses DRAM as computational memory and transforms it to a fundamental processing unit for genome assembly. PIM-Assembler can perform efficient X(N)OR-based operations inside DRAM incurring low cost on top of commodity DRAM designs (~5% of chip area). PIM-Assembler is then optimized through a correlated data partitioning and mapping methodology that allows local storage and processing of DNA short reads to fully exploit the genome assembly algorithm-level's parallelism. The simulation results show that PIM-Assembler achieves on average 8.4× and 2.3 wise× higher throughput for performing bulk bit-XNOR-based comparison operations compared with CPU and recent processing-in-DRAM platforms, respectively. As for comparison/addition-extensive genome assembly application, it reduces the execution time and power by ~5× and ~ 7.5× compared to GPU. 
    more » « less
  3. Recently, in-DRAM computing is becoming one promising technique to address the notorious ‘memory-wall’ issue for big data processing. In this work, for the first time, we propose a novel ‘Min/Max-in-memory’ algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum of bulk data stored in DRAM as unsigned & signed integers, fixed-point and floating numbers. We then develop a new processing-in-DRAM architecture, called Max-PIM, that supports complete bit-wise Boolean logic and beyond. Differentiating from prior works, Max-PIM is optimized with one-cycle fast XNOR logicin-DRAM operation and in-memory data transpose, which are heavily used and keys to accelerate the proposed Min/Max-in-memory algorithm efficiently. Extensive experiments of utilizing Max-PIM in big data sorting and graph processing applications show that it could speed up ~ 50X and ~ 1000X than GPU and CPU, while only consuming 10% and 1% energy, respectively. Moreover, comparing with recent representative In-DRAM computing platforms, i.e., Ambit [1], DRISA [2], our design could speed up ~ 3X - 10X. 
    more » « less
  4. In this work, we present an efficient Processing in MRAM-Accelerated De Bruijn Graph-based DNA Assembly platform, named PANDA, based on an optimized and hardware-friendly genome assembly algorithm. PANDA is able to assemble large-scale DNA sequence datasets from all-pair overlaps. We first design a PANDA platform that exploits MRAM as computational memory and converts it to a potent processing unit for genome assembly. PANDA can not only execute efficient bulk bit-wise X(N)OR-based comparison/addition operations heavily required for the genome assembly task but also a full set of 2-/3-input logic operations inside the MRAM chip. We then develop a highly parallel and step-by-step hardware-friendly DNA assembly algorithm for PANDA that only requires the developed in-memory logic operations. The platform is then configured with a novel data partitioning and mapping technique that provides local storage and processing to utilize the algorithm level’s parallelism fully. The cross-layer simulation results demonstrate that PANDA reduces the run time and power by a factor of 18 and 11, respectively, compared with CPU. Moreover, speed-ups of up to 2.5 to 10× can be obtained over other recent processing in-memory platforms to perform the same task, like STT-MRAM, ReRAM, and DRAM. 
    more » « less
  5. In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼15× energy-saving and 13× speedup over the GPU outperforming recent processing-in-DRAM platforms. 
    more » « less