skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leaking Information Through Cache LRU States
The Least-Recently Used cache replacement policy and its variants are widely deployed in modern processors. This paper shows for the first time in detail that the LRU states of caches can be used to leak information: any access to a cache by a sender will modify the LRU state, and the receiver is able to observe this through a timing measurement. This paper presents LRU timing-based channels both when the sender and the receiver have shared memory, e.g., shared library data pages, and when they are separate processes without shared memory. In addition, the new LRU timing-based channels are demonstrated on both Intel and AMD processors in scenarios where the sender and the receiver are sharing the cache in both hyper-threaded setting and time-sliced setting. The transmission rate of the LRU channels can be up to 600Kbpsper cache set in the hyper-threaded setting. Different from the majority of existing cache channels which require the sender to trigger cache misses, the new LRU channels work with the sender only having cache hits, making the channel faster and more stealthy. This paper also demonstrates that the new LRU channels can be used in transient execution attacks, e.g., Spectre. Further, this paper shows that the LRU channels pose threats to existing secure cache designs, and this work demonstrates the LRU channels affect the secure PL cache. The paper finishes by discussing and evaluating possible defenses.  more » « less
Award ID(s):
1651945 1813797
PAR ID:
10167508
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International Symposium on High-Performance Computer Architecture
Page Range / eLocation ID:
139 to 152
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Based on improvements to an existing three-step model for cache timing-based attacks, this work presents 88Strongtypes of theoretical timing-based vulnerabilities in processor caches. It also presents and implements anew benchmark suite that can be used to test if processor cache is vulnerable to one of the attacks. In total,there are 1094 automatically-generated test programs which cover the 88Strongtheoretical vulnerabilities. The benchmark suite generates the Cache Timing Vulnerability Score (CTVS) which can be used to evaluate how vulnerable a specific cache implementation is to different attacks. A smaller CTVS means the design is more secure.Evaluation is conducted on commodity Intel and AMD processors and shows how the differences in processor implementations can result in different types of attacks that they are vulnerable to. Further, the benchmarks and the CTVS can be used in simulation to help designers of new secure processors and caches evaluate their designs’ susceptibility to cache timing-based attacks. 
    more » « less
  2. null (Ed.)
    Timing side channels have been used to extract cryptographic keys and sensitive documents even from trusted enclaves. Specifically, cache side channels created by reuse of shared code or data in the memory hierarchy have been exploited by several known attacks, e.g., evict+reload for recovering an RSA key and Spectre variants for leaking speculatively loaded data.In this paper, we present TimeCache, a cache design that incorporates knowledge of prior cache line access to eliminate cache side channels due to reuse of shared software (code and data). Our goal is to retain the benefits of a shared cache of allowing each process access to the entire cache and of cache occupancy by a single copy of shared software. We achieve our goal by implementing per-process cache line visibility so that the processes do not benefit from cached data brought in by another process until they have incurred a corresponding miss penalty. Our design achieves low overhead by using a novel combination of timestamps and a hardware design to allow efficient parallel comparisons of the timestamps. The solution works at all the cache levels without the need to limit the number of security domains, and defends against an attacker process running on the same core, on a another hyperthread, or on another core.Our implementation in the gem5 simulator demonstrates that the system is able to defend against RSA key extraction. We evaluate performance using SPEC2006 and PARSEC and observe the overhead of TimeCache to be 1.13% on average. Delay due to first access misses adds the majority of the overhead, with the security context bookkeeping incurred at the time of a context switch contributing 0.02% of the 1.13%. 
    more » « less
  3. In-memory key-value caches are widely used as a performance-critical layer in web applications, disk-based storage, and distributed systems. The Least Recently Used (LRU) replacement policy has become the de facto standard in those systems since it exploits workload locality well. However, the LRU implementation can be costly due to the rigid data structure in maintaining object priority, as well as the locks for object order updating. Redis as one of the most effective and prevalent deployed commercial systems adopts an approximated LRU policy, where the least recently used item from a small, randomly sampled set of items is chosen to evict. This random sampling-based policy is lightweight and shows its flexibility. We observe that there can exist a significant miss ratio gap between exact LRU and random sampling-based LRU under different sampling size $$K$$ s. Therefore existing LRU miss ratio curve (MRC) construction techniques cannot be directly applied without loss of accuracy. In this paper, we introduce a new probabilistic stack algorithm named KRR to accurately model random sampling based-LRU, and extend it to handle both fixed and variable objects in key-value caches. We present an efficient stack update algorithm that reduces the expected running time of KRR significantly. To improve the performance of the in-memory multi-tenant key-value cache that utilizes random sampling-based replacement, we propose kRedis, a reference locality- and latency-aware memory partitioning scheme. kRedis guides the memory allocation among the tenants and dynamically customizes $$K$$ to better exploit the locality of each individual tenant. Evaluation results over diverse workloads show that our model generates accurate miss ratio curves for both fixed and variable object size workloads, and enables practical, low-overhead online MRC prediction. Equipped with KRR, kRedis delivers up to a 50.2% average access latency reduction, and up to a 262.8% throughput improvement compared to Redis. Furthermore, by comparing with pRedis, a state-of-the-art design of memory allocation in Redis, kRedis shows up to 24.8% and 61.8% improvements in average access latency and throughput, respectively. 
    more » « less
  4. Despite its success in many areas, deep learning is a poor fit for use in hardware predictors because these models are impractically large and slow, but this paper shows how we can use deep learning to help design a new cache replacement policy. We first show that for cache replacement, a powerful LSTM learning model can in an offline setting provide better accuracy than current hardware predictors. We then perform analysis to interpret this LSTM model, deriving a key insight that allows us to design a simple online model that matches the offline model's accuracy with orders of magnitude lower cost. The result is the Glider cache replacement policy, which we evaluate on a set of 33 memory-intensive programs from the SPEC 2006, SPEC 2017, and GAP (graph-processing) benchmark suites. In a single-core setting, Glider outperforms top finishers from the 2nd Cache Replacement Championship, reducing the miss rate over LRU by 8.9%, compared to reductions of 7.1% for Hawkeye, 6.5% for MPPPB, and 7.5% for SHiP++. On a four-core system, Glider improves IPC over LRU by 14.7%, compared with improvements of 13.6% (Hawkeye), 13.2% (MPPPB), and 11.4% (SHiP++). 
    more » « less
  5. Last-Level Cache (LLC) attacks typically exploit timing side channels in hardware, and thus rely heavily on timers for their operation. Many proposed defenses against such side-channel attacks capitalize on this reliance. This paper presents PRIME+ABORT, a new cache attack which bypasses these defenses by not depending on timers for its function. Instead of a timing side channel, PRIME+ABORT leverages the Intel TSX hardware widely available in both server- and consumer-grade processors. This work shows that PRIME+ABORT is not only invulnerable to important classes of defenses, it also outperforms state-of-the-art LLC PRIME+PROBE attacks in both accuracy and efficiency, having a maximum detection speed (in events per second) 3× higher than LLC PRIME+PROBE on Intel’s Skylake architecture while producing fewer false positives. 
    more » « less