skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Colored Refresh Server for DRAM
Bounding each task’s worst-case execution time (WCET) accurately is essential for real-time systems to determine if all deadlines can be met. Yet, access latencies to Dynamic Random Access Memory (DRAM) vary significantly due to DRAM refresh, which blocks access to memory cells. Variations further increase as DRAM density grows. This work contributes the “Colored Refresh Server” (CRS), a uniprocessor scheduling paradigm that partitions DRAM in two distinctly colored groups such that refreshes of one color occur in parallel to the execution of real-time tasks of the other color. By executing tasks in phase with periodic DRAM refreshes with opposing colors, memory requests no longer suffer from refresh interference. Experimental results confirm that refresh overhead  more » « less
Award ID(s):
1813004
PAR ID:
10120544
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE International Symposium on Real-Time Computing (ISORC)
Page Range / eLocation ID:
27 to 34
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. eal-time systems with hard timing constrains require known upper bounds on each task’s worst-case execution time (WCET) to determine if all deadlines can be met. One challenge in predictable execution is that Dynamic Random Access Memory (DRAM) cells must be refreshed periodically to maintain data validity, yet memory remains blocked during refresh, which results in overly pessimistic WCET bounds. This work contributes “Colored Refresh” to hide DRAM refresh overhead while preserving real-time schedulability for cyclic executives, which are widely used in highly critical systems. Colored Refresh partitions DRAM memory at rank granularity such that refreshes rotate round-robin from rank to rank. Real-time tasks are assigned different ranks via colored memory allocation. By cooperatively scheduling real-time tasks and refresh operations, memory requests no longer suffer from refresh interference. This reduces memory access latencies for tasks irrespective of DRAM density and size. Hence, Colored Refresh reduces a task’s WCET and makes its execution more predictable. 
    more » « less
  2. As the capacity of DRAM continues to grow, the refresh operation rapidly becomes the performance and power-efficiency bottleneck. Also, restore time, the time given for recharging cells post access, makes an increasingly large amount of negative impact on performance. To tackle these problems, in this paper, we propose an in-situ charge detection and adaptive data restoration DRAM (CDAR-DRAM) architecture, which can dynamically adjust the refresh rate and also relax the constraints on restore time. The proposed CDAR-DRAM employs a low-cost skewed-inverter-based detector, which can reduce the excessive timing margins that prior work added to guarantee the functionality of leaky DRAM cells under the worst-case temperature condition. Moreover, an adaptive DRAM refresh and restore scheme is proposed, which can switch automatically between two modes: (i) a refresh mode that supports adaptive refresh rate, and (ii) a restore mode that relaxes the constraints on restore time dynamically for cells having sufficient charge. With the transistor-and architecture-level simulations, we evaluate the CDAR-DRAM in an 8-core system across different workloads. Compared with the prior art, the proposed architecture achieves a 9.4% improvement in system performance and a 14.3% reduction in energy consumption, without requiring the time-consuming profiling process which many prior works employed. 
    more » « less
  3. In this paper, we present RT-Gang: a novel realtime gang scheduling framework that enforces a one-gang-at-atime policy. We find that, in a multicore platform, co-scheduling multiple parallel real-time tasks would require highly pessimistic worst-case execution time (WCET) and schedulability analysis—even when there are enough cores—due to contention in shared hardware resources such as cache and DRAM controller. In RT-Gang, all threads of a parallel real-time task form a real-time gang and the scheduler globally enforces the one-gangat-a-time scheduling policy to guarantee tight and accurate task WCET. To minimize under-utilization, we integrate a state-of-the-art memory bandwidth throttling framework to allow safe execution of best-effort tasks. Specifically, any idle cores, if exist, are used to schedule best-effort tasks but their maximum memory bandwidth usages are strictly throttled to tightly bound interference to real-time gang tasks. We implement RT-Gang in the Linux kernel and evaluate it on two representative embedded multicore platforms using both synthetic and real-world DNN workloads. The results show that RT-Gang dramatically improves system predictability and the overhead is negligible. 
    more » « less
  4. Papadopoulos, Alessandro V. (Ed.)
    Temporal isolation is one of the most significant challenges that must be addressed before Multi-Processor Systems-on-Chip (MPSoCs) can be widely adopted in mixed-criticality systems with both time-sensitive real-time (RT) applications and performance-oriented non-real-time (NRT) applications. Specifically, the main memory subsystem is one of the most prevalent causes of interference, performance degradation and loss of isolation. Existing memory bandwidth regulation mechanisms use static, dynamic, or predictive DRAM bandwidth management techniques to restore the execution time of an application under contention as close as possible to the execution time in isolation. In this paper, we propose a novel distribution-driven regulation whose goal is to achieve a timeliness objective formulated as a constraint on the probability of meeting a certain target execution time for the RT applications. Using existing interconnect-level Performance Monitoring Units (PMU), we can observe the Cumulative Distribution Function (CDF) of the per-request memory latency. Regulation is then triggered to enforce first-order stochastical dominance with respect to a desired reference. Consequently, it is possible to enforce that the overall observed execution time random variable is dominated by the reference execution time. The mechanism requires no prior information of the contending application and treats the DRAM subsystem as a black box. We provide a full-stack implementation of our mechanism on a Commercial Off-The-Shelf (COTS) platform (Xilinx Ultrascale+ MPSoC), evaluate it using real and synthetic benchmarks, experimentally validate that the timeliness objectives are met for the RT applications, and demonstrate that it is able to provide 2.2x more overall throughput for NRT applications compared to DRAM bandwidth management-based regulation approaches. 
    more » « less
  5. null (Ed.)
    The Intel Optane DC Persistent Memory Module (AEP), which is the first commercial available Non-Volatile Memory (NVM) product, offers comparable performance with DRAM while providing larger capacities and data persistence. Existing researches that substitute NVM with DRAM or hybridize them are either emulator-based or focused on how to improve the energy efficiency for writes. Unfortunately, the energy efficiency of the real AEP system is less explored. Based on real AEP, we observe that even though eliminating the DRAM-like refresh energy consumptions, AEP consumes significant different energy at different performance levels. Specifically, requests with time intervals (dispersed) underperform in both performance and energy efficiency when compared with the case of requests without time intervals (compact). This disparity and parallelism exploitation potentials motivate us to propose Sprint-AEP, an energy-efficiency-oriented scheduling method for AEP-equipped servers. Sprint-AEP fully activates adequate AEPs to serve most of the requests by deferring the write requests and prefetching the hottest data. The remaining AEPs will stay in idle mode with a low idle power to save energy. Besides, we also utilize the read parallelism to accelerate the sync and prefetching processes. Compared with energy-unaware AEP usages, our experimental results show that Sprint-AEP saves up to 26% energy with little performance degradation. 
    more » « less