skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the service capacity region of accessing erasure coded content
Cloud storage systems generally add redundancy in storing content files such that K files are replicated or erasure coded and stored on N > K nodes. In addition to providing reliability against failures, the redundant copies can be used to serve a larger volume of content access requests. A request for one of the files can be either be sent to a systematic node, or one of the repair groups. In this paper, we seek to maximize the service capacity region, that is, the set of request arrival rates for the K files that can be supported by a coded storage system. We explore two aspects of this problem: 1) for a given erasure code, how to optimally split incoming requests between systematic nodes and repair groups, and 2) choosing an underlying erasure code that maximizes the achievable service capacity region. In particular, we consider MDS and Simplex codes. Our analysis demonstrates that erasure coding makes the system more robust to skews in file popularity than simply replicating a file at multiple servers, and that coding and replication together can make the capacity region larger than either alone.  more » « less
Award ID(s):
1717314 1718658
PAR ID:
10073796
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
Page Range / eLocation ID:
17 to 24
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The prevalence of disaggregated storage in public clouds has led to increased latency in modern OLAP cloud databases, particularly when handling ad-hoc and highly-selective queries on large objects. To address this, cloud databases have adopted computation pushdown, executing query predicates closer to the storage layer. However, existing pushdown solutions are ine!cient in erasure-coded storage. Cloud storage employs erasure coding that partitions analytics file objects into fixed-sized blocks and distributes them across storage nodes. Consequently, when a speci"c part of the object is queried, the storage system must reassemble the object across nodes, incurring significant network latency. In this work, we present Fusion, an object store for analytics that is optimized for query pushdown on erasure-coded data. It co-designs its erasure coding and file placement topologies, taking into account popular analytics file formats (e.g., Parquet). Fusion employs a novel stripe construction algorithm that prevents fragmentation of computable units within an object, and minimizes storage overhead during erasure coding. Compared to existing erasure-coded stores, Fusion improves median and tail latency by 64% and 81%, respectively, on TPC-H, and up to 40% and 48% respectively, on real-world SQL queries. Fusion achieves this while incurring a modest 1.2% storage overhead compared to the optimal. 
    more » « less
  2. Coded multicasting has been shown to be a promising approach to significantly improve the performance of content delivery networks with multiple caches downstream of a common multicast link. However, the schemes that have been shown to achieve order-optimal performance require content items to be partitioned into several packets that grows exponentially with the number of caches, leading to codes of exponential complexity that jeopardize their promising performance benefits. In this paper, we address this crucial performance-complexity tradeoff in a heterogeneous caching network setting, where edge caches with possibly different storage capacity collect multiple content requests that may follow distinct demand distributions. We extend the asymptotic (in the number of packets per file) analysis of shared link caching networks to heterogeneous network settings, and present novel coded multicast schemes, based on local graph coloring, that exhibit polynomial-time complexity in all the system parameters, while preserving the asymptotically proven multiplicative caching gain even for finite file packetization. We further demonstrate that the packetization order (the number of packets each file is split into) can be traded-off with the number of requests collected by each cache, while preserving the same multiplicative caching gain. Simulation results confirm the superiority of the proposed schemes and illustrate the interesting request aggregation vs. packetization order tradeoff within several practical settings. Our results provide a compelling step towards the practical achievability of the promising multiplicative caching gain in next generation access networks. 
    more » « less
  3. Extreme-scale science applications are highly innovative and constantly evolving. They are expected to generate data in the petabyte and exabyte ranges. Erasure coding has been widely adopted for data storage in data center networks, where the data are encoded and stored in multiple locations. Therefore, an efficient data retrieval service is needed to transfer encoded data from selected multiple stored nodes to a single destination. Elastic optical networks are a promising backbone technology for data center communication due to their capability to efficiently and flexibly allocate the huge optical bandwidth to heterogeneous traffic demands. In this paper, the erasure-coded multi-sourced data retrieval routing and scheduling problem is studied for static traffic in elastic optical networks, and the objective is to minimize the total transmission completion time of all the requests. An integer linear programming formulation and low-complexity heuristic are proposed. Furthermore, analytical lower bounds are derived and a meta-heuristic, Tabu Search, is adopted to solve the problem. Numerical results are presented to show the effectiveness of the proposed methods. 
    more » « less
  4. In the shared-link coded caching problem, formulated by Maddah-Ali and Niesen (MAN), each cache-aided user demands one file (i.e., single file retrieval). This paper generalizes the MAN problem so as to allow users to request scalar linear functions (aka, linear combinations with scalar coefficients) of the files. We propose a novel coded delivery scheme, based on MAN uncoded cache placement, that allows for the decoding of arbitrary scalar linear functions of the files on arbitrary finite fields. Surprisingly, it is shown that the load for cache-aided scalar linear function retrieval depends on the number of linearly independent functions that are demanded, akin to the cache-aided single-file retrieval problem where the load depends on the number of distinct file requests. The proposed scheme is proved to be optimal under the constraint of uncoded cache placement, in terms of worst-case load, and within a factor 2 otherwise. 
    more » « less
  5. The most commonly used setting in the coded caching literature consists of the following four elements: (i) homogeneous file sizes, (ii) homogeneous cache sizes, (iii) user-independent homogeneous file popularity (i.e., all users share the same file preference), and (iv) worst-case rate analysis. While recent results have relaxed some of these assumptions, deeper understanding of the full heterogeneity setting is still much needed since traditional caching schemes place little assumptions on file/cache sizes and almost always allow each user to have his/her own file preference through individualized file request prediction. Taking a microscopic approach, this paper characterizes the exact capacity of the smallest 2-user/2-file (N = K = 2) problem but under the most general setting that simultaneously allows for (i) heterogeneous files sizes, (ii) heterogeneous cache sizes, (iii) user-dependent file popularity, and (iv) average-rate analysis. Solving completely the case of N = K = 2 could shed further insights on the performance and complexity of optimal coded caching with full heterogeneity for arbitrary N and K. 
    more » « less