skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Statistical caching for near memory management
Modern GPUs often use near memory or high-bandwidth memory, which may be managed as cache when the application data is too large to fit in the near memory. Unlike CPU caches, the near memory cache has a much larger size. A recent approach is statistical caching, which shows near optimal results when managing large memory for file caching. The prior work is ideal and not practical. This paper outlines two extensions. It first formulates a new caching algorithm called least expected use (LEU) replacement and shows, through examples, that the statistical solution automatically integrates two otherwise disparate policies. Then the paper describes a system design to implement LEU. To position the new design for discussion, the paper draws parallels with two familiar ideas, branch prediction and spectral analysis, and considers a set of opportunities and challenges of achieving statistical caching in near memory.  more » « less
Award ID(s):
1909099 1717877
PAR ID:
10176036
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the International Symposium on Memory Systems
Page Range / eLocation ID:
411 to 416
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Erek Petrank and Steve Blackburn (Ed.)
    Cache replacement policies typically use some form of statistics on past access behavior. As a common limitation, how- ever, the extent of the history being recorded is limited to either just the data in cache or, more recently, a larger but still finite-length window of accesses, because the cost of keeping a long history can easily outweigh its benefit. This paper presents a statistical method to keep track of instruction pointer-based access reuse intervals of arbitrary length and uses this information to identify the Least Ex- pected Use (LEU) blocks for replacement. LEU uses dynamic sampling supported by novel hardware that maintains a state to record arbitrarily long reuse intervals. LEU is evaluated using the Cache Replacement Championship simulator, tested on PolyBench and SPEC, and compared with five policies including a recent technique that approximates optimal caching using a fixed-length history. By maintaining statistics for an arbitrary history, LEU outperforms previous techniques for a broad range of scientific kernels, whose data reuses are longer than those in traces traditionally used in computer architecture studies. 
    more » « less
  2. Caching techniques are widely used in today’s computing infrastructure from virtual memory management to server cache and memory cache. This paper builds on two observa- tions. First, the space utilization in cache can be improved by varying the cache size based on dynamic application demand. Second, it is easier to predict application behavior statistically than precisely. This paper presents a new variable-size cache that uses statistical knowledge of program behavior to maximize the cache performance. We measure performance using data access traces from real-world workloads, including Memcached traces from Facebook and storage traces from Microsoft Research. In an offline setting, the new cache is demonstrated to outperform even OPT, the optimal fixed- size cache which makes use of precise knowledge of program behavior. 
    more » « less
  3. CPU cache has been used to bridge the processor-memory performance gap to enable high-performance computing. As the cache is of limited capacity, for its maximum efficacy it should (1) avoid caching data that are less likely to be accessed and (2) identify and cache data that would otherwise cost a program multiple memory accesses to reach. Unfortunately, existing cache architectures are inadequate on these two efforts. First, to cost-effectively exploit the spatial locality, they adopt a relatively large and fixed-size cache line as the caching unit. Thus, much of the space in a cache line can be wasted when the data locality is weak. Second, for easy use, the cache is designed to be transparent to programs, which hinders programs from fully exploiting its performance potentials. To address these problems, we propose a high-performance Software Defined Cache (SDC) architecture providing a simple and generic key-value abstraction that allows (1) caching data at a granularity smaller than a cache line, and (2) enabling programs to explicitly insert, retrieve, and invalidate data in the cache with new instructions. By providing a program with the ability of explicitly using the cache as a lookaside key-value buffer, SDC enables a much more efficient cache without disruptively changing the existing cache organization and without substantially increasing hardware cost. We have prototyped SDC on the gem5 simulator and evaluated it with various data index structures and workloads. Experiment results show that SDC can improve the cache performance for the workloads by up to 5.3× over current cache design. 
    more » « less
  4. This paper formulates the cache-aided multi-user Private Information Retrieval (MuPIR) problem, including K u cache-equipped users, each of which wishes to retrieve a desired message efficiently from N distributed databases with access to K independent messages. Privacy of the users’ demands requires that any individual database can not learn anything about the demands of the users. The load of this problem is defined as the average number of downloaded bits per desired message bit. The goal is to find the optimal memory-load trade-off while preserving the demand privacy. Besides the formulation of the MuPIR problem, the contribution of this paper is two-fold. First, we characterize the optimal memory-load trade-off for a system with N = 2 databases, K = 2 messages and K u = 2 users demanding distinct messages; Second, a product design with order optimality guarantee is proposed. In addition, the product design can achieve the optimal load when the cache memory is large enough. The product design embeds the well-known Sun-Jafar PIR scheme into coded caching, in order to benefit from the coded caching gain while preserving the privacy of the users’ demands. 
    more » « less
  5. In virtual machine (VM) allocation systems, caching repetitive and similar VM allocation requests and associated resolution rules is crucial for reducing computational costs and meeting strict latency requirements. While modern allocation systems distribute requests among multiple allocator agents and use caching to improve performance, current schedulers often neglect the cache state and latency considerations when assigning each new request to an agent. Due to the high variance in costs of cache hits and misses and the associated processing overheads of updating the caches, simple load-balancing and cache-aware mechanisms result in high latencies. We introduce Kamino, a high-performance, latencydriven and cache-aware request scheduling system aimed at minimizing end-to-end latencies. Kamino employs a novel scheduling algorithm grounded in theory which uses partial indicators from the cache state to assign each new request to the agent with the lowest estimated latency. Evaluation of Kamino using a high-fidelity simulator on large-scale production workloads shows a 42% reduction in average request latencies. Our deployment of Kamino in the control plane of a large public cloud confirms these improvements, with a 33% decrease in cache miss rates and a 17% reduction in memory usage 
    more » « less