skip to main content


Title: DeepNVM: A Framework for Modeling and Analysis of Non-Volatile Memory Technologies for Deep Learning Applications
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While previous work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM, a framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technologyspecific circuit-level models and the actual memory behavior of various DL workloads. We present both iso-capacity and isoarea performance and energy analysis for systems whose lastlevel caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 4.2× and 5× energy-delay product (EDP) reduction and 2.4× and 3× area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide 2.3× EDP reduction on average across all workloads when compared to SRAM. Our comprehensive cross-layer framework is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPU platforms for deep learning applications.  more » « less
Award ID(s):
1815780 1815899
NSF-PAR ID:
10125756
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Page Range / eLocation ID:
1295 to 1298
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The emergence of embedded magnetic random-access memory (MRAM) and its integration in mainstream semiconductor manufacturing technology have created an unprecedented opportunity for engineering computing systems with improved performance, energy efficiency, lower cost, and unconventional computing capabilities. While the initial interest in the existing generation of MRAM—which is based on the spin-transfer torque (STT) effect in ferromagnetic tunnel junctions—was driven by its nonvolatile data retention and lower cost of integration compared to embedded Flash (eFlash), the focus of MRAM research and development efforts is increasingly shifting toward alternative write mechanisms (beyond STT) and new materials (beyond ferromagnets) in recent years. This has been driven by the need for better speed vs density and speed vs endurance trade-offs to make MRAM applicable to a wider range of memory markets, as well as to utilize the potential of MRAM in various unconventional computing architectures that utilize the physics of nanoscale magnets. In this Perspective, we offer an overview of spin–orbit torque (SOT) as one of these beyond-STT write mechanisms for the MRAM devices. We discuss, specifically, the progress in developing SOT-MRAM devices with perpendicular magnetization. Starting from basic symmetry considerations, we discuss the requirement for an in-plane bias magnetic field which has hindered progress in developing practical SOT-MRAM devices. We then discuss several approaches based on structural, magnetic, and chiral symmetry-breaking that have been explored to overcome this limitation and realize bias-field-free SOT-MRAM devices with perpendicular magnetization. We also review the corresponding material- and device-level challenges in each case. We then present a perspective of the potential of these devices for computing and security applications beyond their use in the conventional memory hierarchy. 
    more » « less
  2. Magneto-Electric FET ( MEFET ) is a recently developed post-CMOS FET, which offers intriguing characteristics for high-speed and low-power design in both logic and memory applications. In this article, we present MeF-RAM , a non-volatile cache memory design based on 2-Transistor-1-MEFET ( 2T1M ) memory bit-cell with separate read and write paths. We show that with proper co-design across MEFET device, memory cell circuit, and array architecture, MeF-RAM is a promising candidate for fast non-volatile memory ( NVM ). To evaluate its cache performance in the memory system, we, for the first time, build a device-to-architecture cross-layer evaluation framework to quantitatively analyze and benchmark the MeF-RAM design with other memory technologies, including both volatile memory (i.e., SRAM, eDRAM) and other popular non-volatile emerging memory (i.e., ReRAM, STT-MRAM, and SOT-MRAM). The experiment results for the PARSEC benchmark suite indicate that, as an L2 cache memory, MeF-RAM reduces Energy Area Latency ( EAT ) product on average by ~98% and ~70% compared with typical 6T-SRAM and 2T1R SOT-MRAM counterparts, respectively. 
    more » « less
  3. The HSC-FPGA offers an intriguing feasible architecture for the next generation of configurable fabrics, which allows embracing the advantages of both CMOS and beyond-CMOS technologies without requiring significant modification to the routing structure, programming paradigms, and synthesis tool-chain of the commercial FPGAs. In the HSC-FPGA, the intrinsic characteristics of magnetic random access memory (MRAM)-look-up table (LUT) circuits are used to implement sequential logic, while combinational logic circuits are implemented by static random access memory (SRAM)-LUTs. Fabric-level simulation results for the developed HSC-FPGA show that it can achieve at least 18%, 70%, and 15% reduction in terms of area, standby power, and read power consumption, respectively, for various ISCAS-89 and ITC-99 benchmark circuits compared to conventional SRAM-based FPGAs. The power consumption values can be further decreased by the power-gating allowed by the non-volatility feature of MRAM-LUTs. Moreover, the benefits of increased heterogeneity for reconfigurable computing is extended along realizing probabilistic computing paradigms within a fabric, which is enabled by probabilistic spin logic devices. The cooperating strengths of technology-heterogeneity and heterogeneity in computing paradigm in the proposed HSC-FPGA are leveraged to develop energy-efficient and reliability-aware training and evaluation circuits for deep belief networks with memristive crossbar arrays and p-bit based probabilistic neurons. 
    more » « less
  4. We offer a perspective on the prospects of ultrafast spintronics and opto-magnetism as a pathway to high-performance, energy-efficient, and non-volatile embedded memory in digital integrated circuit applications. Conventional spintronic devices, such as spin-transfer-torque magnetic-resistive random-access memory (STT-MRAM) and spin–orbit torque MRAM, are promising due to their non-volatility, energy-efficiency, and high endurance. STT-MRAMs are now entering into the commercial market; however, they are limited in write speed to the nanosecond timescale. Improvement in the write speed of spintronic devices can significantly increase their usefulness as viable alternatives to the existing CMOS-based devices. In this article, we discuss recent studies that advance the field of ultrafast spintronics and opto-magnetism. An optimized ferromagnet–ferrimagnet exchange-coupled magnetic stack, which can serve as the free layer of a magnetic tunnel junction (MTJ), can be optically switched in as fast as ∼3 ps. Integration of ultrafast magnetic switching of a similar stack into an MTJ device has enabled electrical readout of the switched state using a relatively larger tunneling magnetoresistance ratio. Purely electronic ultrafast spin–orbit torque induced switching of a ferromagnet has been demonstrated using ∼6 ps long charge current pulses. We conclude our Perspective by discussing some of the challenges that remain to be addressed to accelerate ultrafast spintronics technologies toward practical implementation in high-performance digital information processing systems.

     
    more » « less
  5. In this paper, we propose a novel Spin-Transfer Torque Magnetic Random-Access Memory (STT-MRAM) array design that could simultaneously work as non-volatile memory and implement a reconfigure in-memory logic operation without add-on logic circuits to the memory chip. The computed output could be simply read out like a typical MRAM bit-cell through the modified peripheral circuit. Such intrinsic in-memory computation can be used to process data locally and transfers the “cooked” data to the primary processing unit (i.e. CPU or GPU) for complex computation with high precision requirement. It greatly reduces power-hungry and long distance data communication, and further leads to extreme parallelism within memory. In this work, we further propose an in-memory edge extraction algorithm as a case study to demonstrate the efficiency of in memory preprocessing methodology. The simulation results show that our edge extraction method reduces data communication as much as 8x for grayscale image, thus greatly reducing system energy consumption. Meanwhile, the F-measure result shows only ∼10% degradation compared to conventional edge detection operator, such as Prewitt, Sobel and Roberts. 
    more » « less