skip to main content

Title: Chameleon: An Adaptive Wear Balancer for Flash Clusters
NAND flash-based Solid State Devices (SSDs) offer the desirable features of high performance, energy efficiency, and fast growing capacity. Thus, the use of SSDs is increasing in distributed storage systems. A key obstacle in this context is that the natural unbalance in distributed I/O workloads can result in wear imbalance across the SSDs in a distributed setting. This, in turn can have significant impact on the reliability, performance, and lifetime of the storage deployment. Extant load balancers for storage systems do not consider SSD wear imbalance when placing data, as the main design goal of such balancers is to extract higher performance. Consequently, data migration is the only common technique for tackling wear imbalance, where existing data is moved from highly loaded servers to the least loaded ones. In this paper, we explore an innovative holistic approach, Chameleon, that employs data redundancy techniques such as replication and erasure-coding, coupled with endurance-aware write offloading, to mitigate wear level imbalance in distributed SSD-based storage. Chameleon aims to balance the wear among different flash servers while meeting desirable objectives of: extending life of flash servers; improving I/O performance; and avoiding bottlenecks. Evaluation with a 50 node SSD cluster shows that Chameleon reduces the wear distribution deviation by 81% while improving the write performance by up to 33%.
; ; ; ; ; ; ; ; ;
Award ID(s):
1717660 1702474 1547804
Publication Date:
Journal Name:
the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Sponsoring Org:
National Science Foundation
More Like this
  1. Using flash-based solid state drives (SSDs) as main memory has been proposed as a practical solution towards scaling memory capacity for data-intensive applications. However, almost all existing approaches rely on the paging mechanism to move data between SSDs and host DRAM. This inevitably incurs significant performance overhead and extra I/O traffic. Thanks to the byte-addressability supported by the PCIe interconnect and the internal memory in SSD controllers, it is feasible to access SSDs in both byte and block granularity today. Exploiting the benefits of SSD's byte-accessibility in today's memory-storage hierarchy is, however, challenging as it lacks systems support and abstractionsmore »for programs. In this paper, we present FlatFlash, an optimized unified memory-storage hierarchy, to efficiently use byte-addressable SSD as part of the main memory. We extend the virtual memory management to provide a unified memory interface so that programs can access data across SSD and DRAM in byte granularity seamlessly. We propose a lightweight, adaptive page promotion mechanism between SSD and DRAM to gain benefits from both the byte-addressable large SSD and fast DRAM concurrently and transparently, while avoiding unnecessary page movements. Furthermore, we propose an abstraction of byte-granular data persistence to exploit the persistence nature of SSDs, upon which we rethink the design primitives of crash consistency of several representative software systems that require data persistence, such as file systems and databases. Our evaluation with a variety of applications demonstrates that, compared to the current unified memory-storage systems, FlatFlash improves the performance for memory-intensive applications by up to 2.3x, reduces the tail latency for latency-critical applications by up to 2.8x, scales the throughput for transactional database by up to 3.0x, and decreases the meta-data persistence overhead for file systems by up to 18.9x. FlatFlash also improves the cost-effectiveness by up to 3.8x compared to DRAM-only systems, while enhancing the SSD lifetime significantly.« less
  2. Although flash cells wear out, a typical SSD has enough cells and sufficiently sophisticated firmware that its lifetime generally exceeds the expected lifetime of its host system. Even under heavy use, SSDs last for years and can be replaced upon failure. On a smartphone, in contrast, the hardware is more limited and we show that, under heavy use, one can easily, and more quickly, wear out smartphone flash storage. Consequently, a simple, unprivileged, malicious application can render a smartphone unbootable ("bricked") in a few weeks with no warning signs to the user. This bleak result becomes more worrisome when consideringmore »the fact that smartphone users generally believe it is safe to try out new applications. To combat this problem, we study the I/O behavior of a wide range of Android applications. We find that high-volume write bursts exist, yet none of the applications we checked sustains an average write rate that is high enough to damage the device (under reasonable usage assumptions backed by the literature). We therefore propose a rate-limiting algorithm for write activity that (1) prevents such attacks, (2) accommodates "normal" bursts, and (3) ensures that the smartphone drive lifetime is longer than a preconfigured lower bound (i.e., its warranty). In terms of user experience, our design only requires that, in the worst case of an app that issues continuous, unsustainable, and unusual writes, the user decides whether to shorten the phone's life or rate limit the problematic app.« less
  3. Abstract. Flash based solid state drives (SSDs) have established them- selves as a higher-performance alternative to hard disk drives in cloud and mobile environments. Nevertheless, SSDs remain a performance bot- tleneck of computer systems due to their high I/O access latency. A com- mon approach for improving the access latency is prefetching. Prefetch- ing predicts future block accesses and preloads them into main memory ahead of time. In this paper, we discuss the challenges of prefetching in SSDs, explain why prior approaches fail to achieve high accuracy, and present a neural network based prefetching approach that signi cantly outperforms themore »state-of the-art. To achieve high performance, we ad- dress the challenges of prefetching in very large sparse address spaces, as well as prefetching in a timely manner by predicting ahead of time. We collect I/O trace les from several real-world applications running on cloud servers and show that our proposed approach consistently outper- forms the existing stride prefetchers by up to 800 and prior prefetching approaches based on Markov chains by up to 8. Furthermore, we pro- pose an address mapping learning technique to demonstrate the applica- bility of our approach to previously unseen SSD workloads and perform a hyperparameter sensitivity study.« less
  4. Log-based data management systems use storage as if it were an append-only medium, transforming random writes into sequential writes, which delivers significant benefits when logs are persisted on hard disks. Although solid-state drives (SSDs) offer improved random write capabilities, sequential writes continue to be advan- tageous due to locality and space efficiency. However, the inherent properties of flash-based SSDs induce major disadvantages when used with a random write block interface, causing write amplifica- tion, uneven wear, log stacking, and garbage collection overheads. To eliminate these disadvantages, Zoned Namespace (ZNS) SSDs have recently been introduced. They offer increased capacity, re- ducedmore »write amplification, and open up data placement and garbage collection to the host through zones, which have sequential-write semantics and must be explicitly reset. We explain how the new ZNS Zone Append primitive, which sup- ports pushing fine-grained data placement onto the device, along with our proposal for “Group Append”, which enables sub-block sized appends, could benefit log-structured data management sys- tems. We explore advantages of ZNS SSDs with Zone Append, Group Append, and computational storage in four log-based data management areas: (i) log-based file systems, (ii) LSM trees such as RocksDB, (iii) database systems, and (iv) event logs/shared logs. Furthermore, we propose research directions for each of these data management systems using ZNS SSDs.« less
  5. The freshness of web page indices is the key to improving searching quality of search engines. In Baidu, the major search engine in China, we have developed DirectLoad, an index updating system for efficiently delivering the webscale indices to nationwide data centers. However, the web-scale index updating suffers from increasingly high data volumes during network transmission and inefficient I/O transactions due to slow disk operations. DirectLoad accelerates the index updating streams from two aspects: 1) DirectLoad effectively cuts down the overwhelmingly high volume of indices in transmission by removing the redundant data across versions, and mutates regular operations in amore »key-value storage system for successful accesses to the deduplicated datasets. 2) DirectLoad significantly improves the I/O efficiency by replacing the LSMTree with a memory-resident table (memtable) and appendingonly- files (AOFs) on disk. Specifically, the write amplification stemming from sorting operations on disk is eliminated, and a lazy garbage collection policy further improves the I/O performance at the software level. In addition, DirectLoad directly manipulates the SSD native interfaces to remove the write amplification at the hardware level. In practice, 63% updating bandwidth has been saved due to the deduplication, and the write throughput to SSDs is increased by 3x. The index updating cycle of our production workloads has been compressed from 15 days to 3 days after deploying DirectLoad. In this paper, we show the effectiveness and efficiency of an in-memory index updating system, which is disruptive to the framework in a conventional memory hierarchy. We hope that this work contributes a strong case study in the system research literature.« less