skip to main content


Title: KV-Fresh: Freshness Authentication for Outsourced Multi-Version Key-Value Stores
Data outsourcing is a promising technical paradigm to facilitate cost-effective real-time data storage, processing, and dissemination. In such a system, a data owner proactively pushes a stream of data records to a third-party cloud server for storage, which in turn processes various types of queries from end users on the data owner’s behalf. This paper considers outsourced multi-version key-value stores that have gained increasing popularity in recent years, where a critical security challenge is to ensure that the cloud server returns both authentic and fresh data in response to end users’ queries. Despite several recent attempts on authenticating data freshness in outsourced key-value stores, they either incur excessively high communication cost or can only offer very limited real-time guarantee. To fill this gap, this paper introduces KV-Fresh, a novel freshness authentication scheme for outsourced key-value stores that offers strong real-time guarantee. KV-Fresh is designed based on a novel data structure, Linked Key Span Merkle Hash Tree, which enables highly efficient freshness proof by embedding chaining relationship among records generated at different time. Detailed simulation studies using a synthetic dataset generated from real data confirm the efficacy and efficiency of KV-Fresh.  more » « less
Award ID(s):
1933069 1514381 1824355 1619251
NSF-PAR ID:
10202848
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Computer Communications (INFOCOM)
Page Range / eLocation ID:
1638 to 1647
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data outsourcing is a promising technical paradigm to facilitate cost-effective real-time data storage, processing, and dissemination. In such a system, a data owner proactively pushes a stream of data records to a third-party cloud server for storage, which in turn processes various types of queries from end users on the data owner’s behalf. This paper considers outsourced multi-version key-value stores that have gained increasing popularity in recent years, where a critical security challenge is to ensure that the cloud server returns both authentic and fresh data in response to end users’ queries. Despite several recent attempts on authenticating data freshness in outsourced key value stores, they either incur excessively high communication cost or can only offer very limited real-time guarantee. To fill this gap, this paper introduces KV-Fresh, a novel freshness authentication scheme for outsourced key-value stores that offers strong real-time guarantee. KV-Fresh is designed based on a novel data structure, Linked Key Span Merkle Hash Tree, which enables highly efficient freshness proof by embedding chaining relationship among records generated at different time. Detailed simulation studies using a synthetic dataset generated from real data confirm the efficacy and efficiency of KV-Fresh. 
    more » « less
  2. Key-value (KV) stores play an increasingly critical role in supporting diverse large-scale applications in modern data centers hosting terabytes of KV items which even might reside on a single server due to virtualization purpose. The combination of ever growing volume of KV items and storage/application consolidation is driving a trend of high storage density for KV stores. Shingled Magnetic Recording (SMR) represents a promising technology for increasing disk capacity, but it comes at a cost of poor random write performance and severe I/O amplification. Applications/software working with SMR devices need to be designed and optimized in an SMR-friendly manner. In this work, we present SEALDB, a Log-Structured Merge tree (LSM-tree) based key-value store that is specifically op- timized for and works well with SMR drives via adequately addressing the poor random writes and severe I/O amplification issues. First, for LSM-trees, SEALDB concatenates SSTables of each compaction, and groups them into sets. Taking sets as the basic unit for compactions, SEALDB improves compaction efficiency by mitigating random I/Os. Second, SEALDB creates varying size bands on HM-SMR drives, named dynamic bands. Dynamic bands not only accommodate the storage of sets, but also eliminate the auxiliary write amplification from SMR drives. We demonstrate the advantages of SEALDB via extensive experiments in various workloads. Overall, SEALDB delivers impressive performance improvement. Compared with LevelDB, SEALDB is 3.42× faster on random load due to improved compaction efficiency and eliminated auxiliary write amplification on SMR drives. 
    more » « less
  3. Key-value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices don’t natively support range queries which are critical to all three types of applications described above. In this paper, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key-value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7x in terms of throughput and reduce CPU usage and external write amplifications by 14.3x and 9.8x, respectively. 
    more » « less
  4. Host-managed shingled magnetic recording drives (HMSMR) give a capacity advantage to harness the explosive growth of data. Applications where data is sequentially written and randomly read, such as key-value stores based on Log-Structured Merge Trees (LSM-trees), make the HMSMR an ideal solution due to its capacity, predictable performance, and economical cost. However, building an LSMtree based KV store on HM-SMR drives presents severe challenges in maintaining the performance and space efficiency due to the redundant cleaning processes for applications and storage devices (i.e., compaction and garbage collections). To eliminate the overhead of on-disk garbage collections (GC) and improve compaction efficiency, this paper presents GearDB, a GC-free KV store tailored for HMSMR drives. GearDB proposes three new techniques: a new on-disk data layout, compaction windows, and a novel gear compaction algorithm. We implement and evaluate GearDB with LevelDB on a real HM-SMR drive. Our extensive experiments have shown that GearDB achieves both good performance and space efficiency, i.e., on average 1:71 faster than LevelDB in random write with a space efficiency of 89.9%. 
    more » « less
  5. Millions of sensors, mobile applications and machines now generate billions of events. Specialized many-core key-value stores (KVSs) can ingest and index these events at high rates (over 100 Mops/s on one machine) if events are generated on the same machine; however, to be practical and cost-effective they must ingest events over the network and scale across cloud resources elastically. We present Shadowfax, a new distributed KVS based on FASTER, that transparently spans DRAM, SSDs, and cloud blob storage while serving 130 Mops/s/VM over commodity Azure VMs using conventional Linux TCP. Beyond high single-VM performance, Shadowfax uses a unique approach to distributed reconfiguration that avoids any server-side key ownership checks or cross-core coordination both during normal operation and migration. Hence, Shadowfax can shift load in 17 s to improve system throughput by 10 Mops/s with little disruption. Compared to the state-of-the-art, it has 8x better throughput (than Seastar+memcached) and avoids costly I/O to move cold data during migration. On 12 machines, Shadowfax retains its high throughput to perform 930 Mops/s, which, to the best of our knowledge, is the highest reported throughput for a distributed KVS used for large-scale data ingestion and indexing. 
    more » « less