skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Statistical caching for near memory management
Modern GPUs often use near memory or high-bandwidth memory, which may be managed as cache when the application data is too large to fit in the near memory. Unlike CPU caches, the near memory cache has a much larger size. A recent approach is statistical caching, which shows near optimal results when managing large memory for file caching. The prior work is ideal and not practical. This paper outlines two extensions. It first formulates a new caching algorithm called least expected use (LEU) replacement and shows, through examples, that the statistical solution automatically integrates two otherwise disparate policies. Then the paper describes a system design to implement LEU. To position the new design for discussion, the paper draws parallels with two familiar ideas, branch prediction and spectral analysis, and considers a set of opportunities and challenges of achieving statistical caching in near memory.  more » « less
Award ID(s):
1909099 1717877
PAR ID:
10176036
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the International Symposium on Memory Systems
Page Range / eLocation ID:
411 to 416
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Caching techniques are widely used in today’s computing infrastructure from virtual memory management to server cache and memory cache. This paper builds on two observa- tions. First, the space utilization in cache can be improved by varying the cache size based on dynamic application demand. Second, it is easier to predict application behavior statistically than precisely. This paper presents a new variable-size cache that uses statistical knowledge of program behavior to maximize the cache performance. We measure performance using data access traces from real-world workloads, including Memcached traces from Facebook and storage traces from Microsoft Research. In an offline setting, the new cache is demonstrated to outperform even OPT, the optimal fixed- size cache which makes use of precise knowledge of program behavior. 
    more » « less
  2. Erek Petrank and Steve Blackburn (Ed.)
    Cache replacement policies typically use some form of statistics on past access behavior. As a common limitation, how- ever, the extent of the history being recorded is limited to either just the data in cache or, more recently, a larger but still finite-length window of accesses, because the cost of keeping a long history can easily outweigh its benefit. This paper presents a statistical method to keep track of instruction pointer-based access reuse intervals of arbitrary length and uses this information to identify the Least Ex- pected Use (LEU) blocks for replacement. LEU uses dynamic sampling supported by novel hardware that maintains a state to record arbitrarily long reuse intervals. LEU is evaluated using the Cache Replacement Championship simulator, tested on PolyBench and SPEC, and compared with five policies including a recent technique that approximates optimal caching using a fixed-length history. By maintaining statistics for an arbitrary history, LEU outperforms previous techniques for a broad range of scientific kernels, whose data reuses are longer than those in traces traditionally used in computer architecture studies. 
    more » « less
  3. CPU cache has been used to bridge the processor-memory performance gap to enable high-performance computing. As the cache is of limited capacity, for its maximum efficacy it should (1) avoid caching data that are less likely to be accessed and (2) identify and cache data that would otherwise cost a program multiple memory accesses to reach. Unfortunately, existing cache architectures are inadequate on these two efforts. First, to cost-effectively exploit the spatial locality, they adopt a relatively large and fixed-size cache line as the caching unit. Thus, much of the space in a cache line can be wasted when the data locality is weak. Second, for easy use, the cache is designed to be transparent to programs, which hinders programs from fully exploiting its performance potentials. To address these problems, we propose a high-performance Software Defined Cache (SDC) architecture providing a simple and generic key-value abstraction that allows (1) caching data at a granularity smaller than a cache line, and (2) enabling programs to explicitly insert, retrieve, and invalidate data in the cache with new instructions. By providing a program with the ability of explicitly using the cache as a lookaside key-value buffer, SDC enables a much more efficient cache without disruptively changing the existing cache organization and without substantially increasing hardware cost. We have prototyped SDC on the gem5 simulator and evaluated it with various data index structures and workloads. Experiment results show that SDC can improve the cache performance for the workloads by up to 5.3× over current cache design. 
    more » « less
  4. This paper formulates the cache-aided multi-user Private Information Retrieval (MuPIR) problem, including K u cache-equipped users, each of which wishes to retrieve a desired message efficiently from N distributed databases with access to K independent messages. Privacy of the users’ demands requires that any individual database can not learn anything about the demands of the users. The load of this problem is defined as the average number of downloaded bits per desired message bit. The goal is to find the optimal memory-load trade-off while preserving the demand privacy. Besides the formulation of the MuPIR problem, the contribution of this paper is two-fold. First, we characterize the optimal memory-load trade-off for a system with N = 2 databases, K = 2 messages and K u = 2 users demanding distinct messages; Second, a product design with order optimality guarantee is proposed. In addition, the product design can achieve the optimal load when the cache memory is large enough. The product design embeds the well-known Sun-Jafar PIR scheme into coded caching, in order to benefit from the coded caching gain while preserving the privacy of the users’ demands. 
    more » « less
  5. Content delivery networks (CDNs) distribute much of the Internet content by caching and serving the objects requested by users. A major goal of a CDN is to maximize the hit rates of its caches, thereby enabling faster content downloads to the users. Content caching involves two components: an admission algorithm to decide whether to cache an object and an eviction algorithm to determine which object to evict from the cache when it is full. In this paper, we focus on cache admission and propose a novel algorithm called RL-Cache that uses model-free reinforcement learning (RL) to decide whether or not to admit a requested object into the CDN’s cache. Unlike prior approaches that use a small set of criteria for decision making, RL-Cache weights a large set of features that include the object size, recency, and frequency of access. We develop a publicly available implementation of RL-Cache and perform an evaluation using production traces for the image, video, and web traffic classes from Akamai’s CDN. The evaluation shows that RL-Cache improves the hit rate in comparison with the state of the art and imposes only a modest resource overhead on the CDN servers. Further, RL-Cache is robust enough that it can be trained in one location and executed on request traces of the same or different traffic classes in other locations of the same geographic region. The paper also reports extensive analyses of the RL-Cache sensitivity to its features and hyperparameter values. The analyses validate the made design choices and reveal interesting insights into the RL-Cache behavior. 
    more » « less