Optimizing LSM-based Key-Value Stores (LSM-KVS) for disaggregated storage is essential to achieve better resource utilization, performance, and flexibility. Most of the existing studies focus on offloading the compaction to the storage nodes to mitigate the performance penalties caused by heavy network traffic between computing and storage. However, several critical issues are not addressed including the strong dependency between offloaded compaction and LSM-KVS, resource load-balancing, compaction scheduling, and complex transient errors. To address the aforementioned issues and limitations, in this paper, we propose CaaS-LSM, a novel disaggregated LSM-KVS with a new idea of Compaction-as-a-Service. CaaS-LSM brings three key contributions. First, CaaS-LSM decouples the compaction from LSM-KVS and achieves stateless execution to ensure high flexibility and avoid coordination overhead with LSM-KVS. Second, CaaS-LSM introduces a performance- and resource-optimized control plane to guarantee better performance and resource utilization via an adaptive run-time scheduling and management strategy. Third, CaaS-LSM addresses different levels of transient and execution errors via sophisticated error-handling logic. We implement the prototype of CaaS-LSM based on RocksDB and evaluate it with different LSM-based distributed databases (Kvrocks and Nebula). In the storage disaggregated setup, CaaS-LSM achieves up to 8X throughput improvement and reduces the P99 latency up to 98% compared with the conventional LSM-KVS, and up to 61% of improvement compared with state-of-the-art LSM-KVS optimized for disaggregated storage.
more »
« less
Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage
Large-scale distributed storage systems, such as object stores, usually apply hashing-based placement and lookup methods to achieve scalability and resource efficiency. However, when object locations are determined by hash values, placement becomes inflexible, failing to optimize or satisfy application requirements such as load balance, failure tolerance, parallelism, and network/system performance. This work presents a novel solution to achieve the best of two worlds: flexibility while maintaining cost-effectiveness and scalability. The proposed method Smash is an object placement and lookup method that achieves full placement flexibility, balanced load, low resource cost, and short latency. Smash utilizes a recent space-efficient data structure and applies it to object-location lookups. We implement Smash as a prototype system and evaluate it in a public cloud. The analysis and experimental results show that Smash achieves full placement flexibility, fast storage operations, fast recovery from node dynamics, and lower DRAM cost (<60%) compared to existing hash-based solutions such as Ceph and MapX.
more »
« less
- PAR ID:
- 10424890
- Date Published:
- Journal Name:
- Proceedings of the ACM on Measurement and Analysis of Computing Systems
- Volume:
- 7
- Issue:
- 2
- ISSN:
- 2476-1249
- Page Range / eLocation ID:
- 1 to 22
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In-memory data management systems, such as key-value stores, have become an essential infrastructure in today's big-data processing and cloud computing. They rely on efficient index structures to access data. While unordered indexes, such as hash tables, can perform point search with O(1) time, they cannot be used in many scenarios where range queries must be supported. Many ordered indexes, such as B+ tree and skip list, have a O(log N) lookup cost, where N is number of keys in an index. For an ordered index hosting billions of keys, it may take more than 30 key-comparisons in a lookup, which is an order of magnitude more expensive than that on a hash table. With availability of large memory and fast network in today's data centers, this O(log N) time is taking a heavy toll on applications that rely on ordered indexes. In this paper we introduce a new ordered index structure, named Wormhole, that takes O(log L) worst-case time for looking up a key with a length of L. The low cost is achieved by simultaneously leveraging strengths of three indexing structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index. Wormhole's range operations can be performed by a linear scan of a list after an initial lookup. This improvement of access efficiency does not come at a price of compromised space efficiency. Instead, Wormhole's index space is comparable to those of B+ tree and skip list. Experiment results show that Wormhole outperforms skip list, B+ tree, ART, and Masstree by up to 8.4x, 4.9x, 4.3x, and 6.6x in terms of key lookup throughput, respectively.more » « less
-
The imbalanced I/O load on large parallel file systems affects the parallel I/O performance of high-performance computing (HPC) applications. One of the main reasons for I/O imbalances is the lack of a global view of system-wide resource consumption. While approaches to address the problem already exist, the diversity of HPC workloads combined with different file striping patterns prevents widespread adoption of these approaches. In addition, load-balancing techniques should be transparent to client applications. To address these issues, we proposeTarazu, an end-to-end control plane where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages real-time load statistics for global data placement on distributed storage servers, while our design model employs trace-based optimization techniques to minimize latency for I/O load requests between clients and servers and to handle multiple striping patterns in files. We evaluate our proposed system on an experimental cluster for two common use cases: the synthetic I/O benchmark IOR and the scientific application I/O kernel HACC-I/O. We also use a discrete-time simulator with real HPC application traces from emerging workloads running on the Summit supercomputer to validate the effectiveness and scalability ofTarazuin large-scale storage environments. The results show improvements in load balancing and read performance of up to 33% and 43%, respectively, compared to the state-of-the-art.more » « less
-
Data deduplication relies on a chunk index to identify the redundancy of incoming chunks. As backup data scales, it is impractical to maintain the entire chunk index in memory. Consequently, an index lookup needs to search the portion of the on-storage index, causing a dramatic regression of index lookup throughput. Existing studies propose to search a subset of the whole index (partial index) to limit the storage I/Os and guarantee a high index lookup throughput. However, several core factors of designing partial indexing are not fully exploited. In this paper, we first comprehensively investigate the trade-offs of using different meta-groups, sampling methods, and meta-group selection policies for a partial index. We then propose a Collaborative Partial Index (CPI) which takes advantage of two meta-groups including recipe-segment and container-catalog to achieve more efficient and effective unique chunk identification. CPI further introduces a hook-entry sharing technology and a two-stage eviction policy to reduce memory usage without hurting the deduplication ratio. According to evaluation, with the same constraints of memory usage and storage I/O, CPI achieves a 1.21x-2.17x higher deduplication ratio than the state-of-the-art partial indexing schemes. Alternatively, CPI achieves 1.8X-4.98x higher index lookup throughput than others when the same deduplication ratio is achieved. Compared with full indexing, CPI's maximum deduplication ratio is only 4.07% lower but its throughput is 37.1x - 122.2x of that of full indexing depending on different storage I/O constraints in our evaluation cases.more » « less
-
Cloud object storage such as AWS S3 is cost-effective and highly elastic but relatively slow, while high-performance cloud storage such as AWS ElastiCache is expensive and provides limited elasticity. We present a new cloud storage service called ServerlessMemory, which stores data using the memory of serverless functions. ServerlessMemory employs a sliding-window-based memory management strategy inspired by the garbage collection mechanisms used in the programming language to effectively segregate hot/cold data and provides fine-grained elasticity, good performance, and a pay-per-access cost model with extremely low cost. We then design and implement InfiniStore, a persistent and elastic cloud storage system, which seamlessly couples the function-based ServerlessMemory layer with a persistent, inexpensive cloud object store layer. InfiniStore enables durability despite function failures using a fast parallel recovery scheme built on the auto-scaling functionality of a FaaS (Function-as-a-Service) platform. We evaluate InfiniStore extensively using both microbenchmarking and two real-world applications. Results show that InfiniStore has more performance benefits for objects larger than 10 MB compared to AWS ElastiCache and Anna, and InfiniStore achieves 26.25% and 97.24% tenant-side cost reduction compared to InfiniCache and ElastiCache, respectively.more » « less
An official website of the United States government

