Emerging Non-Volatile Memories (NVMs) are expected to be included in future main memory, providing the opportunity to host important data persistently in main memory. However, achieving persistency requires that programs be written with failure-safety in mind. Many persistency models and techniques have been proposed to help the programmer reason about failure-safety. They require that the programmer eagerly flush data out of caches to make it persistent. Eager persistency comes with a large overhead because it adds many instructions to the program for flushing cache lines and incurs costly stalls at barriers to wait for data to become durable. To reduce these overheads, we propose Lazy Persistency (LP), a software persistency technique that allows caches to slowly send dirty blocks to the NVMM through natural evictions. With LP, there are no additional writes to NVMM, no decrease in write endurance, and no performance degradation from cache line flushes and barriers. Persistency failures are discovered using software error detection (checksum), and the system recovers from them by recomputing inconsistent results. We describe the properties and design of LP and demonstrate how it can be applied to loop-based kernels popularly used in scientific computing. We evaluate LP and compare it to the state-of-the-art Eager Persistency technique from prior work. Compared to it, LP reduces the execution time and write amplification overheads from 9% and 21% to only 1% and 3%, respectively.
more »
« less
BBB: Simplifying Persistent Programming using Battery-Backed Buffers
Non-volatile memory (NVM) is poised to augment or replace DRAM as main memory. With the right abstraction and support, non-volatile main memory (NVMM) can provide an alternative to the storage system to host long-lasting persistent data. However, keeping persistent data in memory requires programs to be written such that data is crash consistent (i.e. it can be recovered after failure). Critical to supporting crash recovery is the guarantee of ordering of when stores become durable with respect to program order. Strict persistency, which requires persist order to coincide with program order of stores, is simple and intuitive but generally thought to be too slow. More relaxed persistency models are available but demand higher programming complexity, e.g. they require the programmer to insert persist barriers correctly in their program. We identify the source of strict persistency inefficiency as the gap between the point of visibility (PoV) which is the cache, and the point of persistency (PoP) which is the memory. In this paper, we propose a new approach to close the PoV/PoP gap which we refer to as Battery-Backed Buffer (BBB). The key idea of BBB is to provide a battery-backed persist buffer (bbPB) in each core next to the L1 data cache (L1D). A store value is allocated in the bbPB as it is written to cache, becoming part of the persistence domain. If a crash occurs, battery ensures bbPB can be fully drained to NVMM. BBB simplifies persistent programming as the programmer does not need to insert persist barriers or flushes. Furthermore, our BBB design achieves nearly identical results to eADR in terms of performance and number of NVMM writes, while requiring two orders of magnitude smaller energy and time to drain.
more »
« less
- Award ID(s):
- 1717486
- PAR ID:
- 10297726
- Date Published:
- Journal Name:
- 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
- Page Range / eLocation ID:
- 111 to 124
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Persistent memory presents a great opportunity for crash-consistent computing in large-scale computing systems. The ability to recover data upon power outage or crash events can significantly improve the availability of large-scale systems, while improving the performance of persistent data applications (e.g., database applications). However, persistent memory suffers from high write latency and requires specific programming model (e.g., Intel’s PMDK) to guarantee crash consistency, which results in long latency to persist data. To mitigate these problems, recent standards advocate for sufficient back-up power that can flush the whole cache hierarchy to the persistent memory upon detection of an outage, i.e., extending the persistence domain to include the cache hierarchy. In the secure NVM with extended persistent domain(EPD), in addition to flushing the cache hierarchy, extra actions need to be taken to protect the flushed cache data. These extra actions of secure operation could cause significant burden on energy costs and battery size. We demonstrate that naive implementations could lead to significantly expanding the required power holdup budget (e.g., 10.3x more operations than EPD system without secure memory support). The significant overhead is caused by memory accesses of secure metadata. In this paper, we present Horus, a novel EPD-aware secure memory implementation. Horus reduces the overhead during draining period of EPD system by reducing memory accesses of secure metadata. Experiment result shows that Horus reduces the draining time by 5x, compared with the naive baseline design.more » « less
-
Hardware Transactional Memory (HTM) simplifies concurrent programming and can accelerate multithreaded execution through lock elision. Non-Volatile Memory (NVM) combines the speed and byte addressability of DRAM with the durability of storage, enabling the construction of high-performance, persistent data structures. Unfortunately, the write-back instructions typically needed to ensure post-crash consistency in NVM cause HTM transactions to abort, precluding the straightforward combination of HTM and persistent data structures. The problem goes away on machines with persistent caches, but these require special battery-backed circuitry and are far from commonplace.To combine HTM and persistent data structures, we advocate for buffered durable linearizability (BDL), a relaxed correctness criterion that enables recovery to a "recent" consistent state in the wake of a crash, allowing writes-back to occur outside transactions.Significantly, BDL retains the persistence guarantees of storage systems—such as databases backed by disks or flash—that have relied on buffering for decades.The combination of HTM and buffered durability enables three separate usage scenarios. First, we add durability to an existing HTM-based structure (a van Emde Boas tree due to Khalaji et al.); second, we use HTM to simplify an existing persistent structure (a skiplist due to Wang et al.); third, we "back port" an HTM-based structure optimized for persistent caches (a hash table due to Zhang et al.) to work well on more conventional processors. The first two scenarios yield several-fold improvements in throughput; the third sees very little slowdown.more » « less
-
null (Ed.)The performance of persistent applications is severely hurt by current secure processor architectures. Persistent applications use long-latency flush instructions and memory fences to make sure that writes to persistent data reach the persistency domain in a way that is crash consistent. Recently introduced features like Intel's Asynchronous DRAM Refresh (ADR) make the on-chip Write Pending Queue (WPQ) part of the persistency domain and help reduce the penalty of persisting data since data only needs to reach the on-chip WPQ to be considered persistent. However, when persistent applications run on secure processors, for the sake of securing memory many cycles are added to the critical path of their write operations before they ever reach the persistent WPQ, preventing them from fully exploiting the performance advantages of the persistent WPQ. Our goal in this work is to make it feasible for secure persistent applications to benefit more from the on-chip persistency domain. We propose Dolos, an architecture that prioritizes persisting data without sacrificing security in order to gain a significant performance boost for persistent applications. Dolos achieves this goal by an additional minor security unit, Mi-SU, that utilizes a much faster secure process that protects only the WPQ. Thus, the secure operation latency in the critical path of persist operations is reduced and hence persistent transactions can complete earlier. Dolos retains a conventional major security unit for protecting memory that occurs off the critical path after inserting secured data into the WPQ. To evaluate our design, we implemented our architecture in the GEM5 simulator and analyzed the performance of 6 benchmarks from the WHISPER suite. Dolos improves their performance by 1.66x on average.more » « less
-
Byte-addressable non-volatile memory (NVM) is a promising technology that provides near-DRAM performance with scalable memory capacity. However, it requires atomic data durability to ensure memory persistency. Therefore, many techniques, including logging and shadow paging, have been proposed. However, most of them either introduce extra write traffic to NVM or suffer from significant performance overhead on the critical path of program execution, or even both. In this paper, we propose a transparent and efficient hardware-assisted out-of-place update (HOOP) mechanism that supports atomic data durability, without incurring much extra writes and performance overhead. The key idea is to write the updated data to a new place in NVM, while retaining the old data until the updated data becomes durable. To support this, we develop a lightweight indirection layer in the memory controller to enable efficient address translation and adaptive garbage collection for NVM. We evaluate HOOP with a variety of popular data structures and data-intensive applications, including key-value stores and databases. Our evaluation shows that HOOP achieves low critical-path latency with small write amplification, which is close to that of a native system without persistence support. Compared with state-of-the-art crash-consistency techniques, it improves application performance by up to 1.7×, while reducing the write amplification by up to 2.1×. HOOP also demonstrates scalable data recovery capability on multi-core systems.more » « less
An official website of the United States government

