Cloud storage systems generally add redundancy in storing content files such that K files are replicated or erasure coded and stored on N > K nodes. In addition to providing reliability against failures, the redundant copies can be used to serve a larger volume of content access requests. A request for one of the files can be either be sent to a systematic node, or one of the repair groups. In this paper, we seek to maximize the service capacity region, that is, the set of request arrival rates for the K files that can be supported by a coded storage system. We explore two aspects of this problem: 1) for a given erasure code, how to optimally split incoming requests between systematic nodes and repair groups, and 2) choosing an underlying erasure code that maximizes the achievable service capacity region. In particular, we consider MDS and Simplex codes. Our analysis demonstrates that erasure coding makes the system more robust to skews in file popularity than simply replicating a file at multiple servers, and that coding and replication together can make the capacity region larger than either alone.
more »
« less
Repair Rates for Multiple Descriptions on Distributed Storage
In a traditional distributed storage system, a source can be restored perfectly when a certain subset of servers is contacted. The coding is independent of the contents of the source. This paper considers instead a lossy source coding version of this problem where the more servers that are contacted, the higher the quality of the restored source. An example could be video stored on distributed storage. In information theory, this is called the multiple description problem, where the distortion depends on the number of descriptions received. The problem considered in this paper is how to restore the system operation when one of the servers fail and a new server replaces it, that is, repair. The requirement is that the distortions in the restored system should be no more than in the original system. The question is how many extra bits are needed for repair. We find an achievable rate and show that this is optimal in certain cases. One conclusion is that it is necessary to design the multiple description codes with repair in mind; just using an existing multiple description code results in unnecessary high repair rates.
more »
« less
- Award ID(s):
- 1908957
- PAR ID:
- 10464176
- Date Published:
- Journal Name:
- Entropy
- Volume:
- 24
- Issue:
- 5
- ISSN:
- 1099-4300
- Page Range / eLocation ID:
- 612
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)This paper considers the MapReduce-like coded distributed computing framework originally proposed by Li et al., which uses coding techniques when distributed computing servers exchange their computed intermediate values, in order to reduce the overall traffic load. In their original model, servers are connected via an error-free common communication bus allowing broadcast transmissions. However, this assumption is one of the major limitations for practical implementations since real-world data centers may have network topologies far more involved than a single broadcast bus. We formulate a topological coded distributed computing problem, where the computing servers communicate with each other through some switch network. By using a special instance of fat-tree topologies, referred to as t-ary fat-tree proposed by Al-Fares et al. which can be built by some inexpensive switches, we propose a coded distributed computing scheme to achieve the optimal max-link communication load (defined as the maximum load over all links) over any network topology.more » « less
-
Codes over rings, especially over Galois rings, have been extensively studied for nearly three decades due to their similarity to linear codes over finite fields. A distributed storage system uses a linear code to encode a large file across several nodes. If one of the nodes fails, a linear exact repair scheme efficiently recovers the failed node by accessing and downloading data from the rest of the servers of the storage system. In this paper, we develop a linear repair scheme for free maximum distance separable codes, which coincide with free maximum distance with respect to the rank codes over Galois rings. In particular, we give a linear repair scheme for full-length Reed–Solomon codes over a Galois ring.more » « less
-
We consider the storage–retrieval rate trade-off in private information retrieval (PIR) systems using a Shannon-theoretic approach. Our focus is mostly on the canonical two-message two-database case, for which a coding scheme based on random codebook generation and the binning technique is proposed. This coding scheme reveals a hidden connection between PIR and the classic multiple description source coding problem. We first show that when the retrieval rate is kept optimal, the proposed non-linear scheme can achieve better performance over any linear scheme. Moreover, a non-trivial storage-retrieval rate trade-off can be achieved beyond space-sharing between this extreme point and the other optimal extreme point, achieved by the retrieve-everything strategy. We further show that with a method akin to the expurgation technique, one can extract a zero-error PIR code from the random code. Outer bounds are also studied and compared to establish the superiority of the non-linear codes over linear codes.more » « less
-
NAND flash-based Solid State Devices (SSDs) offer the desirable features of high performance, energy efficiency, and fast growing capacity. Thus, the use of SSDs is increasing in distributed storage systems. A key obstacle in this context is that the natural unbalance in distributed I/O workloads can result in wear imbalance across the SSDs in a distributed setting. This, in turn can have significant impact on the reliability, performance, and lifetime of the storage deployment. Extant load balancers for storage systems do not consider SSD wear imbalance when placing data, as the main design goal of such balancers is to extract higher performance. Consequently, data migration is the only common technique for tackling wear imbalance, where existing data is moved from highly loaded servers to the least loaded ones. In this paper, we explore an innovative holistic approach, Chameleon, that employs data redundancy techniques such as replication and erasure-coding, coupled with endurance-aware write offloading, to mitigate wear level imbalance in distributed SSD-based storage. Chameleon aims to balance the wear among different flash servers while meeting desirable objectives of: extending life of flash servers; improving I/O performance; and avoiding bottlenecks. Evaluation with a 50 node SSD cluster shows that Chameleon reduces the wear distribution deviation by 81% while improving the write performance by up to 33%.more » « less
An official website of the United States government

