skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Non-Fusion Based Coherent Cache Randomization Using Cross-Domain Accesses
Randomization has proven to be a good defense against conflictbased side-channel attacks in a shared cache. It improves security by assigning a unique randomization scheme to each security domain, e.g., though a different hashing function. However, if two domains have shared data, the domains must be fused in order to guarantee correctness (i.e., data coherence). Such domain fusion significantly reduces the effectiveness of randomization and weakens its security protection. We propose randomization with sharing (RAWS), which enables secure cross-domain accesses while enforcing cache coherence (and thus data coherence). Based on RAWS, we design a non-fusion based inter-domain coherence protocol (NF-IDCP). NF-IDCP enables cache coherence by looking up and flushing multiple cache lines associated with shared-writable data during their cross-domain accesses. Furthermore, NF-IDCP uses constant-delay banking to securely reduce the latency of the cache line flushes. We also use a secure tag-based filter (STF) to reduce flush costs, for example, by explicitly storing the exact cache locations to be flushed. The security evaluation shows that conflict attacks on the optimized NF-IDCP structures cannot leak conflict observations at a meaningful rate. Attack simulations using CacheFX demonstrate that domain fusion significantly retards the protection provided by randomization schemes. Performance overhead of SPECrate 2017 and PARSEC 3.0 benchmarks is evaluated on ZSim, a microarchitectural simulator. To study the performance impact on realistic workloads, such as Firefox, Chromium and X Server, we use a cache simulator built on top of PANDA, a full-system emulator. Across all configurations, the average performance overhead is less than 5%, and the hardware overhead is less than 3% compared to a domainfused randomization.  more » « less
Award ID(s):
2106771
PAR ID:
10541497
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400704826
Format(s):
Medium: X
Location:
Singapore Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Timing side channels have been used to extract cryptographic keys and sensitive documents even from trusted enclaves. Specifically, cache side channels created by reuse of shared code or data in the memory hierarchy have been exploited by several known attacks, e.g., evict+reload for recovering an RSA key and Spectre variants for leaking speculatively loaded data.In this paper, we present TimeCache, a cache design that incorporates knowledge of prior cache line access to eliminate cache side channels due to reuse of shared software (code and data). Our goal is to retain the benefits of a shared cache of allowing each process access to the entire cache and of cache occupancy by a single copy of shared software. We achieve our goal by implementing per-process cache line visibility so that the processes do not benefit from cached data brought in by another process until they have incurred a corresponding miss penalty. Our design achieves low overhead by using a novel combination of timestamps and a hardware design to allow efficient parallel comparisons of the timestamps. The solution works at all the cache levels without the need to limit the number of security domains, and defends against an attacker process running on the same core, on a another hyperthread, or on another core.Our implementation in the gem5 simulator demonstrates that the system is able to defend against RSA key extraction. We evaluate performance using SPEC2006 and PARSEC and observe the overhead of TimeCache to be 1.13% on average. Delay due to first access misses adds the majority of the overhead, with the security context bookkeeping incurred at the time of a context switch contributing 0.02% of the 1.13%. 
    more » « less
  2. Abstract—Recent work has demonstrated the security risk associated with micro-architecture side-channels. The cache timing side-channel is a particularly popular target due to its availability and high leakage bandwidth. Existing proposals for defending cache side-channel attacks either degrade cache performance and/or limit cache sharing, hence, should only be invoked when the system is under attack. A lightweight monitoring mechanism that detects malicious micro-architecture manipulation in realistic environments is essential for the judicious deployment of these defense mechanisms. In this paper, we propose PREDATOR, a cache side-channel attack detector that identifies cache events caused by an attacker. To detect side-channel attacks in noisy environments, we take advantage of the observation that, unlike non-specific noises, an active attacker alters victim’s micro-architectural states on security critical accesses and thus causes the victim extra cache events on those accesses. PREDATOR uses precise performance counters to collect detailed victim’s access information and analyzes location-based deviations. PREDATOR is capable of detecting five different attacks with high accuracy and limited performance overhead in complex noisy execution environments. PREDATOR remains effective even when the attacker slows the attack rate by 256 times. Furthermore, PREDATOR is able to accurately report details about the attack such as the instruction that accesses the attacked data. In the case of GnuPG RSA [20], PREDATOR can pinpoint the square/multiply operations in the Modulo-Reduce algorithm; and in the case of OpenSSL AES [45], it can identify the accesses to the Te-Table. 
    more » « less
  3. Persistent memory presents a great opportunity for crash-consistent computing in large-scale computing systems. The ability to recover data upon power outage or crash events can significantly improve the availability of large-scale systems, while improving the performance of persistent data applications (e.g., database applications). However, persistent memory suffers from high write latency and requires specific programming model (e.g., Intel’s PMDK) to guarantee crash consistency, which results in long latency to persist data. To mitigate these problems, recent standards advocate for sufficient back-up power that can flush the whole cache hierarchy to the persistent memory upon detection of an outage, i.e., extending the persistence domain to include the cache hierarchy. In the secure NVM with extended persistent domain(EPD), in addition to flushing the cache hierarchy, extra actions need to be taken to protect the flushed cache data. These extra actions of secure operation could cause significant burden on energy costs and battery size. We demonstrate that naive implementations could lead to significantly expanding the required power holdup budget (e.g., 10.3x more operations than EPD system without secure memory support). The significant overhead is caused by memory accesses of secure metadata. In this paper, we present Horus, a novel EPD-aware secure memory implementation. Horus reduces the overhead during draining period of EPD system by reducing memory accesses of secure metadata. Experiment result shows that Horus reduces the draining time by 5x, compared with the naive baseline design. 
    more » « less
  4. Abstract—Protection of cache hierarchies from side-channel attacks is critical for building secure systems, particularly the ones using Trusted Execution Environments (TEEs). In this pa- per, we consider the problem of efficient and secure fine-grained partitioning of cache hierarchies and propose a framework, called Secure Hierarchies for TEEs (TEE-SHirT). In the context of a three-level cache system, TEE-SHirT consists of partitioned shared last-level cache, partitioned private L2 caches, and non- partitioned L1 caches that are flushed on context switches and system calls. Efficient and correct partitioning requires careful design. Towards this goal, TEE-SHirT makes three contributions: 1) we demonstrate how the hardware structures used for holding cache partitioning metadata can be effectively virtualized to avoid flushing of cache partition content on context switches and system calls; 2) we show how to support multi-threaded enclaves in TEE- SHirT, addressing the issues of coherency and consistency that arise with both intra-core and inter-core data sharing; 3) we develop a formal security model for TEE-SHirT to rigorously reason about the security of our design. 
    more » « less
  5. Graphics processing units (GPUs) are becoming default accelerators in many domains such as high-performance computing (HPC), deep learning, and virtual/augmented reality. Recently, GPUs have also shown significant speedups for a variety of security-sensitive applications such as encryptions. These speedups have largely benefited from the high memory bandwidth and compute throughput of GPUs. One of the key features to optimize the memory bandwidth consumption in GPUs is intra-warp memory access coalescing, which merges memory requests originating from different threads of a single warp into as few cache lines as possible. However, this coalescing feature is also shown to make the GPUs prone to the correlation timing attacks as it exposes the relationship between the execution time and the number of coalesced accesses. Consequently, an attacker is able to correctly reveal an AES private key via repeatedly gathering encrypted data and execution time on a GPU. In this work, we propose a series of defense mechanisms to alleviate such timing attacks by carefully trading off performance for improved security. Specifically, we propose to randomize the coalescing logic such that the attacker finds it hard to guess the correct number of coalesced accesses generated. To this end, we propose to randomize: a) the granularity (called as subwarp) at which warp threads are grouped together for coalescing, and b) the threads selected by each subwarp for coalescing. Such randomization techniques result in three mechanisms: fixed-sized subwarp (FSS), random-sized subwarp (RSS), and random-threaded subwarp (RTS). We find that the combination of these security mechanisms offers 24- to 961-times improvement in the security against the correlation timing attacks with 5 to 28% performance degradation. Online copy: http://adwaitjog.github.io/docs/pdf/rcoal-hpca18.pdf 
    more » « less