skip to main content

Title: Mitigating Voltage Drop in Resistive Memories by Dynamic RESET Voltage Regulation and Partition RESET
The emerging resistive random access memory (ReRAM) technology has been deemed as one of the most promising alternatives to DRAM in main memories, due to its better scalability, zero cell leakage and short read latency. The cross-point (CP) array enables ReRAM to obtain the theoretical minimum 4F^2 cell size by placing a cell at the cross-point of a word-line and a bit-line. However, ReRAM CP arrays suffer from large sneak current resulting in significant voltage drop that greatly prolongs the array RESET latency. Although prior works reduce the voltage drop in CP arrays, they either substantially increase the array peripheral overhead or cannot work well with wear leveling schemes. In this paper, we propose two array micro-architecture level techniques, dynamic RESET voltage regulation (DRVR) and partition RESET (PR), to mitigate voltage drop on both bit-lines and word-lines in ReRAM CP arrays. DRVR dynamically provides higher RESET voltage to the cells far from the write driver and thus encountering larger voltage drop on a bit-line, so that all cells on a bit-line share approximately the same latency during RESETs. PR decides how many and which cells to reset online to partition the CP array into multiple equivalent circuits with smaller word-line more » resistance and voltage drop. Because DRVR and PR greatly reduce the array RESET latency, the ReRAM-based main memory lifetime under the worst case non-stop write traffic significantly decreases. To increase the CP array endurance, we further upgrade DRVR by providing lower RESET voltage to the cells suffering from less voltage drop on a word-line. Our experimental results show that, compared to the combination of prior voltage drop reduction techniques, our DRVR and PR improve the system performance by 11.7% and decrease the energy consumption by 46% averagely, while still maintaining >10-year main memory system lifetime. « less
Award ID(s):
1909509 1908992
Publication Date:
Journal Name:
IEEE International Symposium on High Performance Computer Architecture
Page Range or eLocation-ID:
275 to 286
Sponsoring Org:
National Science Foundation
More Like this
  1. Magneto-Electric FET ( MEFET ) is a recently developed post-CMOS FET, which offers intriguing characteristics for high-speed and low-power design in both logic and memory applications. In this article, we present MeF-RAM , a non-volatile cache memory design based on 2-Transistor-1-MEFET ( 2T1M ) memory bit-cell with separate read and write paths. We show that with proper co-design across MEFET device, memory cell circuit, and array architecture, MeF-RAM is a promising candidate for fast non-volatile memory ( NVM ). To evaluate its cache performance in the memory system, we, for the first time, build a device-to-architecture cross-layer evaluation framework to quantitatively analyze and benchmark the MeF-RAM design with other memory technologies, including both volatile memory (i.e., SRAM, eDRAM) and other popular non-volatile emerging memory (i.e., ReRAM, STT-MRAM, and SOT-MRAM). The experiment results for the PARSEC benchmark suite indicate that, as an L2 cache memory, MeF-RAM reduces Energy Area Latency ( EAT ) product on average by ~98% and ~70% compared with typical 6T-SRAM and 2T1R SOT-MRAM counterparts, respectively.
  2. In this paper, we propose ReDRAM, as a reconfigurable DRAM-based processing-in-memory (PIM) accelerator, which transforms current DRAM architecture to massively parallel computational units exploiting the high internal bandwidth of modern memory chips. ReDRAM uses the analog operation of DRAM sub-arrays and elevates it to implement a full set of 1- and 2-input bulk bit-wise operations (NOT, (N)AND, (N)OR, and even X(N)OR) between operands stored in the same bit-line, based on a new dual-row activation mechanism with a modest change to peripheral circuits such sense amplifiers. ReDRAM can be leveraged to greatly reduce energy consumption and latency of complex in-DRAM logic computations relying on state-of-the-art mechanisms based on triple-row activation, dual-contact cells, row initialization, NOR style, etc. The extensive circuit-architecture simulations show that ReDRAM achieves on average 54× and 7.1× higher throughput for performing bulk bit-wise operations compared with CPU and GPU, respectively. Besides, ReDRAM outperforms recent processing-in-DRAM platforms with up to 3.7× better performance.
  3. Abstract

    One of the most challenging obstacles to realizing exascale computing is minimizing the energy consumption of L2 cache, main memory, and interconnects to that memory. For promising cryogenic computing schemes utilizing Josephson junction superconducting logic, this obstacle is exacerbated by the cryogenic system requirements that expose the technology’s lack of high-density, high-speed and power-efficient memory. Here we demonstrate an array of cryogenic memory cells consisting of a non-volatile three-terminal magnetic tunnel junction element driven by the spin Hall effect, combined with a superconducting heater-cryotron bit-select element. The write energy of these memory elements is roughly 8 pJ with a bit-select element, designed to achieve a minimum overhead power consumption of about 30%. Individual magnetic memory cells measured at 4 K show reliable switching with write error rates below 10−6, and a 4 × 4 array can be fully addressed with bit select error rates of 10−6. This demonstration is a first step towards a full cryogenic memory architecture targeting energy and performance specifications appropriate for applications in superconducting high performance and quantum computing control systems, which require significant memory resources operating at 4 K.

  4. Eye tracking has become an essential human-machine interaction modality for providing immersive experience in numerous virtual and augmented reality (VR/AR) applications desiring high throughput (e.g., 240 FPS), small-form, and enhanced visual privacy. However, existing eye tracking systems are still limited by their: (1) large form-factor largely due to the adopted bulky lens-based cameras; (2) high communication cost required between the camera and backend processor; and (3) potentially concerned low visual privacy, thus prohibiting their more extensive applications. To this end, we propose, develop, and validate a lensless FlatCambased eye tracking algorithm and accelerator co-design framework dubbed EyeCoD to enable eye tracking systems with a much reduced form-factor and boosted system efficiency without sacrificing the tracking accuracy, paving the way for next-generation eye tracking solutions. On the system level, we advocate the use of lensless FlatCams instead of lens-based cameras to facilitate the small form-factor need in mobile eye tracking systems, which also leaves rooms for a dedicated sensing-processor co-design to reduce the required camera-processor communication latency. On the algorithm level, EyeCoD integrates a predict-then-focus pipeline that first predicts the region-of-interest (ROI) via segmentation and then only focuses on the ROI parts to estimate gaze directions, greatly reducing redundant computations andmore »data movements. On the hardware level, we further develop a dedicated accelerator that (1) integrates a novel workload orchestration between the aforementioned segmentation and gaze estimation models, (2) leverages intra-channel reuse opportunities for depth-wise layers, (3) utilizes input feature-wise partition to save activation memory size, and (4) develops a sequential-write-parallel-read input buffer to alleviate the bandwidth requirement for the activation global buffer. On-silicon measurement and extensive experiments validate that our EyeCoD consistently reduces both the communication and computation costs, leading to an overall system speedup of 10.95×, 3.21×, and 12.85× over general computing platforms including CPUs and GPUs, and a prior-art eye tracking processor called CIS-GEP, respectively, while maintaining the tracking accuracy. Codes are available at« less
  5. Phase change memory devices become practical for non-volatile storage at small dimensions due to reduced power and predictable device operation. In larger scale cells, devices can be locally melted due to filament formation and liquid filaments can be retained in parts of the cell for a long time even if most or all of the cells are initially amorphized during long fall-times. The complex amorphization and crystallization dynamics make these large cells more unpredictable and enable their applications as physically unclonable functions (PUF) [1,2]. Computational analysis of the complex amorphization-crystallization dynamics in phase change memory devices with large geometries is important to understand the evolution of phase distributions and temperature profiles during programming of these devices. In this work, we conduct electrothermal finite element simulations of reset operation on a large Ge2Sb2Te5 (GST) cell using the framework we have developed in COMSOL multiphysics [3]-[9] and analyze the complex dynamics of amorphization, nucleation and growth during electrical stress. We input voltage waveforms measured from electrical characterization of on-oxide GST line cells with bottom metal contact pads and Si3N4 capping. A 2D polycrystalline model of the experimentally measured cells (~360 nm wide, ~400 nm long and ~50 nm thick) is constructed inmore »the simulations. Access devices are modeled using the spice models. The simulations capture some of the interplay between changes in the device resistance due to heating and phase changes and current fluctuations.« less