Modern NVMe solid state drives offer significantly higher bandwidth and low latency than prior storage devices. Current key-value stores struggle to fully utilize the bandwidth of such devices. This paper presents SplinterDB, a new key-value store explicitly designed for NVMe solid state drives. SplinterDB is designed around a novel data structure (the STBε-tree), that exposes I/O and CPU concurrency and reduces write amplification without sacrificing query performance. STBε-tree combines ideas from log-structured merge trees and Bε-trees to reduce write amplification and CPU costs of compaction. The SplinterDB memtable and cache are designed to be highly concurrent and to reduce cache misses. We evaluate SplinterDB on a number of micro- and macro-benchmarks, and show that SplinterDB outperforms RocksDB, a state-of-the-art key-value store, by a factor of 6–10x on insertions and 2–2.6x on point queries, while matching RocksDB on small range queries. Furthermore, SplinterDB reduces write amplification by 2x compared to RocksDB.
more »
« less
SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores
Modern NVMe solid state drives offer significantly higher bandwidth and low latency than prior storage devices. Current key-value stores struggle to fully utilize the bandwidth of such devices. This paper presents SplinterDB, a new key-value store explicitly designed for NVMe solid state drives. SplinterDB is designed around a novel data structure (the STBε-tree), that exposes I/O and CPU concurrency and reduces write amplification without sacrificing query performance. STBε-tree combines ideas from log-structured merge trees and Bε-trees to reduce write amplification and CPU costs of compaction. The SplinterDB memtable and cache are designed to be highly concurrent and to reduce cache misses. We evaluate SplinterDB on a number of micro- and macro-benchmarks, and show that SplinterDB outperforms RocksDB, a state-of-the-art key-value store, by a factor of 6–10x on insertions and 2–2.6x on point queries, while matching RocksDB on small range queries. Furthermore, SplinterDB reduces write amplification by 2x compared to RocksDB.
more »
« less
- Award ID(s):
- 1637458
- PAR ID:
- 10192336
- Date Published:
- Journal Name:
- 2020 USENIX Annual Technical Conference
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Modern NVMe solid state drives offer significantly higher bandwidth and lower latency than prior storage devices. Cur- rent key-value stores struggle to fully utilize the bandwidth of such devices. This paper presents SplinterDB, a new key- value store explicitly designed for NVMe solid-state-drives. SplinterDB is designed around a novel data structure (the STBe-tree) that exposes I/O and CPU concurrency and re- duces write amplification without sacrificing query perfor- mance. STBe-tree combines ideas from log-structured merge trees and Be-trees to reduce write amplification and CPU costs of compaction. The SplinterDB memtable and cache are designed to be highly concurrent and to reduce cache misses. We evaluate SplinterDB on a number of micro- and macro-benchmarks, and show that SplinterDB outperforms RocksDB, a state-of-the-art key-value store, by a factor of 6–10⇥ on insertions and 2–2.6⇥ on point queries, while matching RocksDB on small range queries. Furthermore, SplinterDB reduces write amplification by 2⇥ compared to RocksDB.more » « less
-
Computer systems utilizing byte-addressable Non-Volatile Memory ( NVM ) as memory/storage can provide low-latency data persistence. The widely used key-value stores using Log-Structured Merge Tree ( LSM-Tree ) are still beneficial for NVM systems in aspects of the space and write efficiency. However, the significant write amplification introduced by the leveled compaction of LSM-Tree degrades the write performance of the key-value store and shortens the lifetime of the NVM devices. The existing studies propose new compaction methods to reduce write amplification. Unfortunately, they result in a relatively large read amplification. In this article, we propose NVLSM, a key-value store for NVM systems using LSM-Tree with new accumulative compaction. By fully utilizing the byte-addressability of NVM, accumulative compaction uses pointers to accumulate data into multiple floors in a logically sorted run to reduce the number of compactions required. We have also proposed a cascading searching scheme for reads among the multiple floors to reduce read amplification. Therefore, NVLSM reduces write amplification with small increases in read amplification. We compare NVLSM with key-value stores using LSM-Tree with two other compaction methods: leveled compaction and fragmented compaction. Our evaluations show that NVLSM reduces write amplification by up to 67% compared with LSM-Tree using leveled compaction without significantly increasing the read amplification. In write-intensive workloads, NVLSM reduces the average latency by 15.73%–41.2% compared to other key-value stores.more » « less
-
Key-value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices don’t natively support range queries which are critical to all three types of applications described above. In this paper, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key-value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7x in terms of throughput and reduce CPU usage and external write amplifications by 14.3x and 9.8x, respectively.more » « less
-
Many modern key-value stores, such as RocksDB, rely on log-structured merge trees (LSMs). Originally designed for spinning disks, LSMs optimize for write performance by only making sequential writes. But this optimization comes at the cost of reads: LSMs must rely on expensive compaction jobs and Bloom filters---all to maintain reasonable read performance. For NVMe SSDs, we argue that trading off read performance for write performance is no longer always needed. With enough parallelism, NVMe SSDs have comparable random and sequential access performance. This change makes update-in-place designs, which traditionally provide excellent read performance, a viable alternative to LSMs. In this paper, we close the gap between log-structured and update-in-place designs on modern SSDs with the help of new components that take advantage of data and workload patterns. Specifically, we explore three key ideas: (A) record caching for efficient point operations, (B) page grouping for high-performance range scans, and (C) insert forecasting to reduce the reorganization costs of accommodating new records. We evaluate these ideas by implementing them in a prototype update-in-place key-value store called TreeLine. On YCSB, we find that TreeLine outperforms RocksDB and LeanStore by 2.20× and 2.07× respectively on average across the point workloads, and by up to 10.95× and 7.52× overall.more » « less
An official website of the United States government

