skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory
We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Compared to a state-of-the-art DPM key-value store, Dinomo achieves at least 3.8× better throughput at scale on various workloads and higher scalability, while providing fast reconfiguration.  more » « less
Award ID(s):
1751277
PAR ID:
10469317
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
VLDB
Date Published:
Format(s):
Medium: X
Location:
Vancouver, BC, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Disaggregated memory systems achieve resource utilization efficiency and system scalability by distributing computation and memory resources into distinct pools of nodes. RDMA is an attractive solution to support high-throughput communication between different disaggregated resource pools. However, existing RDMA solutions face a dilemma: one-sided RDMA completely bypasses computation at memory nodes, but its communication takes multiple round trips; two-sided RDMA achieves one-round-trip communication but requires non-trivial computation for index lookups at memory nodes, which violates the principle of disaggregated memory. This work presents Outback, a novel indexing solution for key-value stores with a one-round-trip RDMA-based network that does not incur computation-heavy tasks at memory nodes. Outback is the first to utilize dynamic minimal perfect hashing and separates its index into two components: one memory-efficient and compute-heavy component at compute nodes and the other memory-heavy and compute-efficient component at memory nodes. We implement a prototype of Outback and evaluate its performance in a public cloud. The experimental results show that Outback achieves higher throughput than both the state-of-the-art one-sided RDMA and two-sided RDMA-based in-memory KVS by 1.06--5.03×, due to the unique strength of applying a separated perfect hashing index. 
    more » « less
  2. While permissioned blockchains enable a family of data center applications, existing systems suffer from imbalanced loads across compute and memory, exacerbating the underutilization of cloud resources. This paper presents FlexChain , a novel permissioned blockchain system that addresses this challenge by physically disaggregating CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation allows blockchain service providers to upgrade and expand hardware resources independently to support a wide range of smart contracts with diverse CPU and memory demands. Moreover, it ensures efficient resource utilization and hence prevents resource fragmentation in a data center. We have explored the design of XOV blockchain systems in a disaggregated fashion and developed a tiered key-value store that can elastically scale its memory and storage. Our design significantly speeds up the execution stage. We have also leveraged several techniques to parallelize the validation stage in FlexChain to further improve the overall blockchain performance. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively. 
    more » « less
  3. Grove is a concurrent separation logic library for verifying distributed systems. Grove is the first to handle time-based leases, including their interaction with reconfiguration, crash recovery, thread-level concurrency, and unreliable networks. This paper uses Grove to verify several distributed system components written in Go, including GroveKV, a realistic distributed multi-threaded key-value store. GroveKV supports reconfiguration, primary/backup replication, and crash recovery, and uses leases to execute read-only requests on any replica. GroveKV achieves high performance (67-73% of Redis on a single core), scales with more cores and more backup replicas (achieving about 2× the throughput when going from 1 to 3 servers), and can safely execute reads while reconfiguring. 
    more » « less
  4. Disaggregated memory architecture decouples computing and memory resources into separate pools connected via high-speed interconnect technologies, offering substantial advantages in scalability and resource utilization. However, this architecture also poses unique challenges in designing effective index structures and concurrency protocols due to increased remote memory access overhead and its shared-everything nature. In this paper, we present DART, a lock-free two-layer hashed Adaptive Radix Tree (ART) designed to minimize remote memory access while ensuring high concurrency and crash consistency in the disaggregated memory architecture. DART incorporates a hash-based Express Skip Table at its upper layer, which reduces the round trips of remote memory access during index traversal. In the base layer, DART employs an Adaptive Hashed Layout within ART nodes, confining remote memory accesses during in-node searches to small hash buckets. By further leveraging Decoupled Metadata Organization, DART achieves lock-free atomic updates, enabling high scalability and ensuring crash consistency. Our evaluation demonstrates that DART outperforms state-of-the-art counterparts by up to 5.8X in YCSB workloads. 
    more » « less
  5. null (Ed.)
    Passive remote memory remains the holy grail of disaggregation. Most existing systems for disaggregated memory either use remote memory simply as a backing store, or design special-purpose data structures that require some amount of processing co-resident with the remote memory to manage and apply updates. The few proposals for truly passive remote memory perform well only with read-mostly workloads, rapidly deteriorating in the face of even low levels of write contention. We propose to leverage in-network devices (specifically, a programmable top-of-rack switch) to serialize remote memory accesses and resolve any write conflicts in flight. Our prototype is able to completely avoid write contention in the recently published Clover disaggregated key/value store, delivering a performance boost of almost 50% on our testbed under a mixed read/write workload. 
    more » « less