skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SCART: Predicting STT-RAM Cache Retention Times Using Machine Learning
Prior studies have shown that the retention time of the non-volatile spin-transfer torque RAM (STT-RAM) can be relaxed in order to reduce STT-RAM's write energy and latency. However, since different applications may require different retention times, STT-RAM retention times must be critically explored to satisfy various applications' needs. This process can be challenging due to exploration overhead, and exacerbated by the fact that STT-RAM caches are emerging and are not readily available for design time exploration. This paper explores using known and easily obtainable statistics (e.g., SRAM statistics) to predict the appropriate STT-RAM retention times, in order to minimize exploration overhead. We propose an STT-RAM Cache Retention Time (SCART) model, which utilizes machine learning to enable design time or runtime prediction of right-provisioned STT-RAM retention times for latency or energy optimization. Experimental results show that, on average, SCART can reduce the latency and energy by 20.34% and 29.12%, respectively, compared to a homogeneous retention time while reducing the exploration overheads by 52.58% compared to prior work.  more » « less
Award ID(s):
1844952
PAR ID:
10156101
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2019 Tenth International Green and Sustainable Computing Conference (IGSC)
Page Range / eLocation ID:
1 to 7
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS)---a common optimization in modern processors---on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design. 
    more » « less
  2. With the increase in GPU register file (RF) size, its power consumption has also increased. Since RF exists at the highest level in cache hierarchy, designing it with memories with high write latency/energy (e.g., spin transfer torque RAM) can lead to large energy loss. In this paper, we present an spin orbit torque RAM (SOT-RAM) based RF design which provides higher energy efficiency than SRAM and STT-RAM RFs while maintaining performance same as that of SRAM RF. To further improve energy efficiency of SOT-RAM based RF, we propose avoiding redundant bit-writes to RF. Compared to SRAM RF, SOT-RAM RF saves 18.6% energy and by using our technique for avoiding redundant writes, the energy saving can be increased to 44.3%, without harming performance. 
    more » « less
  3. Magneto-Electric FET ( MEFET ) is a recently developed post-CMOS FET, which offers intriguing characteristics for high-speed and low-power design in both logic and memory applications. In this article, we present MeF-RAM , a non-volatile cache memory design based on 2-Transistor-1-MEFET ( 2T1M ) memory bit-cell with separate read and write paths. We show that with proper co-design across MEFET device, memory cell circuit, and array architecture, MeF-RAM is a promising candidate for fast non-volatile memory ( NVM ). To evaluate its cache performance in the memory system, we, for the first time, build a device-to-architecture cross-layer evaluation framework to quantitatively analyze and benchmark the MeF-RAM design with other memory technologies, including both volatile memory (i.e., SRAM, eDRAM) and other popular non-volatile emerging memory (i.e., ReRAM, STT-MRAM, and SOT-MRAM). The experiment results for the PARSEC benchmark suite indicate that, as an L2 cache memory, MeF-RAM reduces Energy Area Latency ( EAT ) product on average by ~98% and ~70% compared with typical 6T-SRAM and 2T1R SOT-MRAM counterparts, respectively. 
    more » « less
  4. Spin-Transfer Torque RAMs (STTRAMs) have been shown to offer much promise for implementing emerging cache architectures. This paper studies the viability of STTRAM caches for mobile workloads from the perspective of energy and latency. Specifically, we explore the benefits of reduced retention STTRAM caches for mobile applications. We analyze the characteristics of mobile applications' cache blocks and how those characteristics dictate the appropriate retention time for mobile device caches. We show that due to their inherently interactive nature, mobile applications' execution characteristics—and hence, STTRAM cache design requirements—differ from other kinds of applications. We also explore various STTRAM cache designs in both single and multicore systems, and at different cache levels, that can efficiently satisfy mobile applications' execution requirements, in order to maximize energy savings without introducing substantial latency overhead. 
    more » « less
  5. Network quantization is one of the most hardware friendly techniques to enable the deployment of convolutional neural networks (CNNs) on low-power mobile devices. Recent network quantization techniques quantize each weight kernel in a convolutional layer independently for higher inference accuracy, since the weight kernels in a layer exhibit different variances and hence have different amounts of redundancy. The quantization bitwidth or bit number (QBN) directly decides the inference accuracy, latency, energy and hardware overhead. To effectively reduce the redundancy and accelerate CNN inferences, various weight kernels should be quantized with different QBNs. However, prior works use only one QBN to quantize each convolutional layer or the entire CNN, because the design space of searching a QBN for each weight kernel is too large. The hand-crafted heuristic of the kernel-wise QBN search is so sophisticated that domain experts can obtain only sub-optimal results. It is difficult for even deep reinforcement learning (DRL) DDPG-based agents to find a kernel-wise QBN configuration that can achieve reasonable inference accuracy. In this paper, we propose a hierarchical-DRL-based kernel-wise network quantization technique, AutoQ, to automatically search a QBN for each weight kernel, and choose another QBN for each activation layer. Compared to the models quantized by the state-of-the-art DRL-based schemes, on average, the same models quantized by AutoQ reduce the inference latency by 54.06%, and decrease the inference energy consumption by 50.69%, while achieving the same inference accuracy. 
    more » « less