skip to main content

Title: SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores
Modern NVMe solid state drives offer significantly higher bandwidth and low latency than prior storage devices. Current key-value stores struggle to fully utilize the bandwidth of such devices. This paper presents SplinterDB, a new key-value store explicitly designed for NVMe solid state drives. SplinterDB is designed around a novel data structure (the STBε-tree), that exposes I/O and CPU concurrency and reduces write amplification without sacrificing query performance. STBε-tree combines ideas from log-structured merge trees and Bε-trees to reduce write amplification and CPU costs of compaction. The SplinterDB memtable and cache are designed to be highly concurrent and to reduce cache misses. We evaluate SplinterDB on a number of micro- and macro-benchmarks, and show that SplinterDB outperforms RocksDB, a state-of-the-art key-value store, by a factor of 6–10x on insertions and 2–2.6x on point queries, while matching RocksDB on small range queries. Furthermore, SplinterDB reduces write amplification by 2x compared to RocksDB.
Authors:
Award ID(s):
1751277
Publication Date:
NSF-PAR ID:
10180595
Journal Name:
2020 USENIX Annual Technical Conference
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern NVMe solid state drives offer significantly higher bandwidth and low latency than prior storage devices. Current key-value stores struggle to fully utilize the bandwidth of such devices. This paper presents SplinterDB, a new key-value store explicitly designed for NVMe solid state drives. SplinterDB is designed around a novel data structure (the STBε-tree), that exposes I/O and CPU concurrency and reduces write amplification without sacrificing query performance. STBε-tree combines ideas from log-structured merge trees and Bε-trees to reduce write amplification and CPU costs of compaction. The SplinterDB memtable and cache are designed to be highly concurrent and to reduce cache misses. We evaluate SplinterDB on a number of micro- and macro-benchmarks, and show that SplinterDB outperforms RocksDB, a state-of-the-art key-value store, by a factor of 6–10x on insertions and 2–2.6x on point queries, while matching RocksDB on small range queries. Furthermore, SplinterDB reduces write amplification by 2x compared to RocksDB.
  2. Modern NVMe solid state drives offer significantly higher bandwidth and lower latency than prior storage devices. Cur- rent key-value stores struggle to fully utilize the bandwidth of such devices. This paper presents SplinterDB, a new key- value store explicitly designed for NVMe solid-state-drives. SplinterDB is designed around a novel data structure (the STBe-tree) that exposes I/O and CPU concurrency and re- duces write amplification without sacrificing query perfor- mance. STBe-tree combines ideas from log-structured merge trees and Be-trees to reduce write amplification and CPU costs of compaction. The SplinterDB memtable and cache are designed to be highly concurrent and to reduce cache misses. We evaluate SplinterDB on a number of micro- and macro-benchmarks, and show that SplinterDB outperforms RocksDB, a state-of-the-art key-value store, by a factor of 6–10⇥ on insertions and 2–2.6⇥ on point queries, while matching RocksDB on small range queries. Furthermore, SplinterDB reduces write amplification by 2⇥ compared to RocksDB.
  3. Key-value (KV) stores play an increasingly critical role in supporting diverse large-scale applications in modern data centers hosting terabytes of KV items which even might reside on a single server due to virtualization purpose. The combination of ever growing volume of KV items and storage/application consolidation is driving a trend of high storage density for KV stores. Shingled Magnetic Recording (SMR) represents a promising technology for increasing disk capacity, but it comes at a cost of poor random write performance and severe I/O amplification. Applications/software working with SMR devices need to be designed and optimized in an SMR-friendly manner. In this work, we present SEALDB, a Log-Structured Merge tree (LSM-tree) based key-value store that is specifically op- timized for and works well with SMR drives via adequately addressing the poor random writes and severe I/O amplification issues. First, for LSM-trees, SEALDB concatenates SSTables of each compaction, and groups them into sets. Taking sets as the basic unit for compactions, SEALDB improves compaction efficiency by mitigating random I/Os. Second, SEALDB creates varying size bands on HM-SMR drives, named dynamic bands. Dynamic bands not only accommodate the storage of sets, but also eliminate the auxiliary write amplification from SMR drives. We demonstratemore »the advantages of SEALDB via extensive experiments in various workloads. Overall, SEALDB delivers impressive performance improvement. Compared with LevelDB, SEALDB is 3.42× faster on random load due to improved compaction efficiency and eliminated auxiliary write amplification on SMR drives.« less
  4. Computer systems utilizing byte-addressable Non-Volatile Memory ( NVM ) as memory/storage can provide low-latency data persistence. The widely used key-value stores using Log-Structured Merge Tree ( LSM-Tree ) are still beneficial for NVM systems in aspects of the space and write efficiency. However, the significant write amplification introduced by the leveled compaction of LSM-Tree degrades the write performance of the key-value store and shortens the lifetime of the NVM devices. The existing studies propose new compaction methods to reduce write amplification. Unfortunately, they result in a relatively large read amplification. In this article, we propose NVLSM, a key-value store for NVM systems using LSM-Tree with new accumulative compaction. By fully utilizing the byte-addressability of NVM, accumulative compaction uses pointers to accumulate data into multiple floors in a logically sorted run to reduce the number of compactions required. We have also proposed a cascading searching scheme for reads among the multiple floors to reduce read amplification. Therefore, NVLSM reduces write amplification with small increases in read amplification. We compare NVLSM with key-value stores using LSM-Tree with two other compaction methods: leveled compaction and fragmented compaction. Our evaluations show that NVLSM reduces write amplification by up to 67% compared with LSM-Tree usingmore »leveled compaction without significantly increasing the read amplification. In write-intensive workloads, NVLSM reduces the average latency by 15.73%–41.2% compared to other key-value stores.« less
  5. The emergence of Intel's Optane DC persistent memory (Optane Pmem) draws much interest in building persistent key-value (KV) stores to take advantage of its high throughput and low latency. A major challenge in the efforts stems from the fact that Optane Pmem is essentially a hybrid storage device with two distinct properties. On one hand, it is a high-speed byte-addressable device similar to DRAM. On the other hand, the write to the Optane media is conducted at the unit of 256 bytes, much like a block storage device. Existing KV store designs for persistent memory do not take into account of the latter property, leading to high write amplification and constraining both write and read throughput. In the meantime, a direct re-use of a KV store design intended for block devices, such as LSM-based ones, would cause much higher read latency due to the former property. In this paper, we propose ChameleonDB, a KV store design specifically for this important hybrid memory/storage device by considering and exploiting these two properties in one design. It uses LSM tree structure to efficiently admit writes with low write amplification. It uses an in-DRAM hash table to bypass LSM-tree's multiple levels for fast reads.more »In the meantime, ChameleonDB may choose to opportunistically maintain the LSM multi-level structure in the background to achieve short recovery time after a system crash. ChameleonDB's hybrid structure is designed to be able to absorb sudden bursts of a write workload, which helps avoid long-tail read latency. Our experiment results show that ChameleonDB improves write throughput by 3.3× and reduces read latency by around 60% compared with a legacy LSM-tree based KV store design. ChameleonDB provides performance competitive even with KV stores using fully in-DRAM index by using much less DRAM space. Compared with CCEH, a persistent hash table design, ChameleonDB provides 6.4× higher write throughput.« less