skip to main content


Title: Architecting Optically Controlled Phase Change Memory

Phase Change Memory (PCM) is an attractive candidate for main memory, as it offers non-volatility and zero leakage power while providing higher cell densities, longer data retention time, and higher capacity scaling compared to DRAM. In PCM, data is stored in the crystalline or amorphous state of the phase change material. The typical electrically controlled PCM (EPCM), however, suffers from longer write latency and higher write energy compared to DRAM and limited multi-level cell (MLC) capacities. These challenges limit the performance of data-intensive applications running on computing systems with EPCMs.

Recently, researchers demonstrated optically controlled PCM (OPCM) cells with support for 5bits/cellin contrast to 2bits/cellin EPCM. These OPCM cells can be accessed directly with optical signals that are multiplexed in high-bandwidth-density silicon-photonic links. The higher MLC capacity in OPCM and the direct cell access using optical signals enable an increased read/write throughput and lower energy per access than EPCM. However, due to the direct cell access using optical signals, OPCM systems cannot be designed using conventional memory architecture. We need a complete redesign of the memory architecture that is tailored to the properties of OPCM technology.

This article presents the design of a unified network and main memory system called COSMOS that combines OPCM and silicon-photonic links to achieve high memory throughput. COSMOS is composed of a hierarchical multi-banked OPCM array with novel read and write access protocols. COSMOS uses an Electrical-Optical-Electrical (E-O-E) control unit to map standard DRAM read/write commands (sent in electrical domain) from the memory controller on to optical signals that access the OPCM cells. Our evaluation of a 2.5D-integrated system containing a processor and COSMOS demonstrates2.14 ×average speedup across graph and HPC workloads compared to an EPCM system. COSMOS consumes3.8×lower read energy-per-bit and5.97×lower write energy-per-bit compared to EPCM. COSMOS is the first non-volatile memory that provides comparable performance and energy consumption as DDR5 in addition to increased bit density, higher area efficiency, and improved scalability.

 
more » « less
Award ID(s):
2131127
NSF-PAR ID:
10466868
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Architecture and Code Optimization
Volume:
19
Issue:
4
ISSN:
1544-3566
Page Range / eLocation ID:
1 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The emerging resistive random access memory (ReRAM) technology has been deemed as one of the most promising alternatives to DRAM in main memories, due to its better scalability, zero cell leakage and short read latency. The cross-point (CP) array enables ReRAM to obtain the theoretical minimum 4F^2 cell size by placing a cell at the cross-point of a word-line and a bit-line. However, ReRAM CP arrays suffer from large sneak current resulting in significant voltage drop that greatly prolongs the array RESET latency. Although prior works reduce the voltage drop in CP arrays, they either substantially increase the array peripheral overhead or cannot work well with wear leveling schemes. In this paper, we propose two array micro-architecture level techniques, dynamic RESET voltage regulation (DRVR) and partition RESET (PR), to mitigate voltage drop on both bit-lines and word-lines in ReRAM CP arrays. DRVR dynamically provides higher RESET voltage to the cells far from the write driver and thus encountering larger voltage drop on a bit-line, so that all cells on a bit-line share approximately the same latency during RESETs. PR decides how many and which cells to reset online to partition the CP array into multiple equivalent circuits with smaller word-line resistance and voltage drop. Because DRVR and PR greatly reduce the array RESET latency, the ReRAM-based main memory lifetime under the worst case non-stop write traffic significantly decreases. To increase the CP array endurance, we further upgrade DRVR by providing lower RESET voltage to the cells suffering from less voltage drop on a word-line. Our experimental results show that, compared to the combination of prior voltage drop reduction techniques, our DRVR and PR improve the system performance by 11.7% and decrease the energy consumption by 46% averagely, while still maintaining >10-year main memory system lifetime. 
    more » « less
  2. Photonic network-on-chip (PNoC) architectures employ photonic links with dense wavelength-division multiplexing (DWDM) to enable high throughput on-chip transfers. Unfortunately, increasing the DWDM degree (i.e., using a larger number of wavelengths) to achieve a higher aggregated data rate in photonic links and, hence, higher throughput in PNoCs, requires sophisticated and costly laser sources along with extra photonic hardware. This extra hardware can introduce undesired noise to the photonic link and increase the bit error rate (BER), power, and area consumption of PNoCs. To mitigate these issues, the use of 4-pulse amplitude modulation (4-PAM) signaling, instead of the conventional on-off keying (OOK) signaling, can halve the wavelength signals utilized in photonic links for achieving the target aggregate data rate while reducing the overhead of crosstalk noise, BER, and photonic hardware. There are various designs of 4-PAM modulators reported in the literature. For example, the signal superposition (SS)–, electrical digital-to-analog converter (EDAC)–, and optical digital-to-analog converter (ODAC)–based designs of 4-PAM modulators have been reported. However, it is yet to be explored how these SS-, EDAC-, and ODAC-based 4-PAM modulators can be utilized to design DWDM-based photonic links and PNoC architectures. In this article, we provide a systematic analysis of the SS, EDAC, and ODAC types of 4-PAM modulators from prior work with regards to their applicability and utilization overheads. We then present a heuristic-based search method to employ these 4-PAM modulators for designing DWDM-based SS, EDAC, and ODAC types of 4-PAM photonic links with two different design goals: (i) to attain the desired BER of 10 -9 at the expense of higher optical power and lower aggregate data rate and (ii) to attain maximum aggregate data rate with the desired BER of 10 -9 at the expense of longer packet transfer latency. We then employ our designed 4-PAM SS–, 4-PAM EDAC–, 4-PAM ODAC–, and conventional OOK modulator–based photonic links to constitute corresponding variants of the well-known CLOS and SWIFT PNoC architectures. We eventually compare our designed SS-, EDAC-, and ODAC-based variants of 4-PAM links and PNoCs with the conventional OOK links and PNoCs in terms of performance and energy efficiency in the presence of inter-channel crosstalk. From our link-level and PNoC-level evaluation, we have observed that the 4-PAM EDAC–based variants of photonic links and PNoCs exhibit better performance and energy efficiency compared with the OOK-, 4-PAM SS–, and 4-PAM ODAC–based links and PNoCs. 
    more » « less
  3. Row hammer attacks exploit electrical interactions between neighboring memory cells in high-density dynamic random-access memory (DRAM) to induce memory errors. By rapidly and repeatedly accessing DRAMs with specific patterns, an adversary with limited privilege on the target machine may trigger bit flips in memory regions that he has no permission to access directly. In this paper, we explore row hammer attacks in cross-VM settings, in which a malicious VM exploits bit flips induced by row hammer attacks to crack memory isolation enforced by virtualization. To do so with high fidelity, we develop novel techniques to determine the physical address mapping in DRAM modules at runtime (to improve the effectiveness of double-sided row hammer attacks), methods to exhaustively hammer a large fraction of physical memory from a guest VM (to collect exploitable vulnerable bits), and innovative approaches to break Xen paravirtualized memory isolation (to access arbitrary physical memory of the shared machine). Our study also suggests that the demonstrated row hammer attacks are applicable in modern public clouds where Xen paravirtualization technology is adopted. This shows that the presented cross-VM row hammer attacks are of practical importance. 
    more » « less
  4. Abstract

    Scalable programmable photonic integrated circuits (PICs) can potentially transform the current state of classical and quantum optical information processing. However, traditional means of programming, including thermo-optic, free carrier dispersion, and Pockels effect result in either large device footprints or high static energy consumptions, significantly limiting their scalability. While chalcogenide-based non-volatile phase-change materials (PCMs) could mitigate these problems thanks to their strong index modulation and zero static power consumption, they often suffer from large absorptive loss, low cyclability, and lack of multilevel operation. Here, we report a wide-bandgap PCM antimony sulfide (Sb2S3)-clad silicon photonic platform simultaneously achieving low loss (<1.0 dB), high extinction ratio (>10 dB), high cyclability (>1600 switching events), and 5-bit operation. These Sb2S3-based devices are programmed via on-chip silicon PIN diode heaters within sub-ms timescale, with a programming energy density of$$\sim 10\,{fJ}/n{m}^{3}$$~10fJ/nm3. Remarkably, Sb2S3is programmed into fine intermediate states by applying multiple identical pulses, providing controllable multilevel operations. Through dynamic pulse control, we achieve 5-bit (32 levels) operations, rendering 0.50 ± 0.16 dB per step. Using this multilevel behavior, we further trim random phase error in a balanced Mach-Zehnder interferometer.

     
    more » « less
  5. Abstract--Spin switch (SS) is a promising spintronic device which exhibits compactness, low power, non-volatility, input-output isolation leveraging giant spin Hall effect, spin transfer torque, and dipolar coupling. In this paper, we propose a novel device-to-architecture co-design for an in-memory computing platform using coterminous SS (IMCS2), which could simultaneously work as non-volatile memory and reconfigurable in-memory logic (AND/NAND, OR/NOR, and XOR/XNOR) without add-on logic circuits to memory chip. The computed logic output could be simply read out like a normal magnetic random access memory bit cell using the shared memory peripheral circuits. Such intrinsic in-memory logic could be used to process data within memory to greatly reduce power-hungry and long distance data communication in the conventional von Neumann computing system. The IMCS2-based in-memory bulk bitwise Boolean vector operation shows ~9x energy saving and ~3x speedup compared with that of DRAM-based in-memory computing platform. We further employ in-memory multiplication to evaluate the performance of the proposed in-memory computing platform for vector-vector multiplication with different vector sizes. 
    more » « less