skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recipe: converting concurrent DRAM indexes to persistent-memory indexes
We present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be translated with small changes to crash-consistency when the same index is used in PM. We present a set of conditions that enable the identification of this class of DRAM indexes, and the actions to be taken to convert each index to be persistent. Based on these conditions and conversion actions, we modify five different DRAM indexes based on B+ trees, tries, radix trees, and hash tables to their crash-consistent PM counterparts. The effort involved in this conversion is minimal, requiring 30--200 lines of code. We evaluated the converted PM indexes on Intel DC Persistent Memory, and found that they outperform state-of-the-art, hand-crafted PM indexes in multi-threaded workloads by up-to 5.2x. For example, we built P-CLHT, our PM implementation of the CLHT hash table by modifying only 30 LOC. When running YCSB workloads, P-CLHT performs up to 2.4x better than Cacheline-Conscious Extendible Hashing (CCEH), the state-of-the-art PM hash table.  more » « less
Award ID(s):
1751277
PAR ID:
10180593
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 27th ACM Symposium on Operating Systems Principles
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Persistent memory (PM) brings important opportunities for improving data storage including the widely used hash tables. However, PM is not friendly to small writes, which causes existing PM hashes to suffer from high hardware write amplification. Hybrid memory offers the performance and concurrency of DRAM and the durability and capacity of PM, but existing hybrid memory hashes cannot deliver high performance, low DRAM footprint, and fast recovery at the same time. This paper proposes WALSH, a flat hash with novel log-structured separate chaining designs to optimize the performance while ensuring low DRAM footprint and fast recovery. To address the overhead of hash resizing and garbage collection (GC), WALSH further proposes partial resizing/GC mechanisms and a 4-phase protocol for concurrent hash operations. As a result, WALSH is the first flat index for hybrid memory with embedded write aggregation ability. A comprehensive evaluation shows that WALSH substantially outperforms state-of-the-art hybrid memory hashes; e.g., its insert throughput is up to 2.4X that of related works while saving more than 87% of DRAM. WALSH also provides efficient recovery; e.g., it can recover a dataset with 1 billion objects in just a few seconds. 
    more » « less
  2. Our extensive experiments reveal that existing key-value stores (KVSs) achieve high performance at the expense of a huge memory footprint that is often impractical or unacceptable. Even with the emerging ultra-fast byte-addressable persistent memory (PM), KVSs fall far short of delivering the high performance promised by PM's superior I/O bandwidth. To find the root causes and bridge the huge performance/memory-footprint gap, we revisit the architectural features of two representative indexing mechanisms (single-stage and multi-stage) and propose a three-stage KVS called FluidKV. FluidKV effectively consolidates these indexes by fast and seamlessly running incoming key-value request stream from the write-concurrent frontend stage to the memory-efficient backend stage across an intermediate stage. FluidKV also designs important enabling techniques, such as thread-exclusive logging, PM-friendly KV-block structures, and dual-grained indexes, to fully utilize both parallel-processing and high-bandwidth capabilities of ultra-fast storage hardware while reducing the overhead. We implemented a FluidKV prototype and evaluated it under a variety of workloads. The results show that FluidKV outperforms the state-of-the-art PM-aware KVSs, including ListDB and FlatStore with different indexes, by up to 9× and 3.9× in write and read throughput respectively, while cutting up to 90% of the DRAM footprint. 
    more » « less
  3. Disaggregated memory architecture decouples computing and memory resources into separate pools connected via high-speed interconnect technologies, offering substantial advantages in scalability and resource utilization. However, this architecture also poses unique challenges in designing effective index structures and concurrency protocols due to increased remote memory access overhead and its shared-everything nature. In this paper, we present DART, a lock-free two-layer hashed Adaptive Radix Tree (ART) designed to minimize remote memory access while ensuring high concurrency and crash consistency in the disaggregated memory architecture. DART incorporates a hash-based Express Skip Table at its upper layer, which reduces the round trips of remote memory access during index traversal. In the base layer, DART employs an Adaptive Hashed Layout within ART nodes, confining remote memory accesses during in-node searches to small hash buckets. By further leveraging Decoupled Metadata Organization, DART achieves lock-free atomic updates, enabling high scalability and ensuring crash consistency. Our evaluation demonstrates that DART outperforms state-of-the-art counterparts by up to 5.8X in YCSB workloads. 
    more » « less
  4. Hardware Transactional Memory (HTM) simplifies concurrent programming and can accelerate multithreaded execution through lock elision. Non-Volatile Memory (NVM) combines the speed and byte addressability of DRAM with the durability of storage, enabling the construction of high-performance, persistent data structures. Unfortunately, the write-back instructions typically needed to ensure post-crash consistency in NVM cause HTM transactions to abort, precluding the straightforward combination of HTM and persistent data structures. The problem goes away on machines with persistent caches, but these require special battery-backed circuitry and are far from commonplace.To combine HTM and persistent data structures, we advocate for buffered durable linearizability (BDL), a relaxed correctness criterion that enables recovery to a "recent" consistent state in the wake of a crash, allowing writes-back to occur outside transactions.Significantly, BDL retains the persistence guarantees of storage systems—such as databases backed by disks or flash—that have relied on buffering for decades.The combination of HTM and buffered durability enables three separate usage scenarios. First, we add durability to an existing HTM-based structure (a van Emde Boas tree due to Khalaji et al.); second, we use HTM to simplify an existing persistent structure (a skiplist due to Wang et al.); third, we "back port" an HTM-based structure optimized for persistent caches (a hash table due to Zhang et al.) to work well on more conventional processors. The first two scenarios yield several-fold improvements in throughput; the third sees very little slowdown. 
    more » « less
  5. null (Ed.)
    We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature – dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM. 
    more » « less