skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Heuristics for Analyzing Download Time in MDS Coded Storage Systems
There has been a growing interest, in both theory and practice, in using the available redundancy in storage systems for mitigating stragglers in content download. This paper is concerned with MDS coded storage systems and studies (n, k) data access model. When k = n, system is equivalent to a fork-join queue, which is known to be notoriously hard to analyze, while system with k = 1 has been previously shown to be equivalent to an M/G/1 queue. We here argue that the system with k = 2 is of practical interest, and then present a method that approximates the system as an M/G/1 queue. Approximated download time is shown to be more accurate than the bounds available in the literature. We also note that the presented method can be used for approximating systems that employ other newly designed and deployed storage codes.  more » « less
Award ID(s):
1717314 1718658
PAR ID:
10073554
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2018 IEEE International Symposium on Information Theory (ISIT)
Page Range / eLocation ID:
1929 to 1933
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Private information retrieval (PIR) allows a user to retrieve a desired message out of K possible messages from N databases (DBs) without revealing the identity of the desired message. In this work, we consider the problem of PIR from uncoded storage constrained DBs. Each DB has a storage capacity of μKL bits, where L is the size of each message in bits, and μ ∈ [1/N, 1] is the normalized storage. In the storage constrained PIR problem, there are two key challenges: a) construction of communication efficient schemes through storage content design at each DB that allow download efficient PIR; and b characterizing the optimal download cost via information-theoretic lower bounds. The novel aspect of this work is to characterize the optimum download cost of PIR with storage constrained DBs for any value of storage. In particular, for any (N, K), we show that the optimal tradeoff between storage (μ) and the download cost (D(μ)) is given by the lower convex hull of the pairs ([t/N](1+[1/t]+[1/(t 2 )]+...+[1/(t K-1 )])) for t = 1,2, ..., N. The main contribution of this paper is the converse proof, i.e., obtaining lower bounds on the download cost for PIR as a function of the available storage. 
    more » « less
  2. Private information retrieval (PIR) allows a user to retrieve a desired message out of K possible messages from N databases without revealing the identity of the desired message. There has been significant recent progress on understanding fundamental information-theoretic limits of PIR, and in particular the download cost of PIR for several variations. Majority of existing works however, assume the presence of replicated databases, each storing all the K messages. In this work, we consider the problem of PIR from storage constrained databases. Each database has a storage capacity of μKL bits, where K is the number of messages, L is the size of each message in bits, and μ ∈ [1/N, 1] is the normalized storage. In the storage constrained PIR problem, there are two key design questions: a) how to store content across each database under storage constraints; and b) construction of schemes that allow efficient PIR through storage constrained databases. The main contribution of this work is a general achievable scheme for PIR from storage constrained databases for any value of storage. In particular, for any (N, K), with normalized storage μ = t/N, where the parameter t can take integer values t ∈ {1, 2, ..., N}, we show that our proposed PIR scheme achieves a download cost of (1 + 1/t + 1/2 + ⋯ + 1/t K-1 ). The extreme case when μ = 1 (i.e., t = N) corresponds to the setting of replicated databases with full storage. For this extremal setting, our scheme recovers the information-theoretically optimal download cost characterized by Sun and Jafar as (1 + 1/N + ⋯ +1/N K-1 ). For the other extreme, when μ = 1/N (i.e., t = 1), the proposed scheme achieves a download cost of K. The most interesting aspect of the result is that for intermediate values of storage, i.e., 1/N <; μ <; 1, the proposed scheme can strictly outperform memory-sharing between extreme values of storage. 
    more » « less
  3. null (Ed.)
    The Gittins scheduling policy minimizes the mean response in the single-server M/G/1 queue in a wide variety of settings. Most famously, Gittins is optimal when preemption is allowed and service requirements are unknown but drawn from a known distribution. Gittins is also optimal much more generally, adapting to any amount of available information and any preemption restrictions. However, scheduling to minimize mean response time in a multiserver setting, specifically the central-queue M/G/k, is a much more difficult problem. In this work we give the first general analysis of Gittins in the M/G/k. Specifically, we show that under extremely general conditions, Gittins's mean response time in the M/G/k is at most its mean response time in the M/G/1 plus an $O(łog(1/(1 - ρ)))$ additive term, where ρ is the system load. A consequence of this result is that Gittins is heavy-traffic optimal in the M/G/k if the service requirement distribution S satisfies $$\mathbfE [S^2(łog S)^+] < \infty$$. This is the most general result on minimizing mean response time in the M/G/k to date. To prove our results, we combine properties of the Gittins policy and Palm calculus in a novel way. Notably, our technique overcomes the limitations of tagged job methods that were used in prior scheduling analyses. 
    more » « less
  4. null (Ed.)
    We consider the private information retrieval (PIR) problem from decentralized uncoded caching databases. There are two phases in our problem setting, a caching phase, and a retrieval phase. In the caching phase, a data center containing all the K files, where each file is of size L bits, and several databases with storage size constraint μ K L bits exist in the system. Each database independently chooses μ K L bits out of the total K L bits from the data center to cache through the same probability distribution in a decentralized manner. In the retrieval phase, a user (retriever) accesses N databases in addition to the data center, and wishes to retrieve a desired file privately. We characterize the optimal normalized download cost to be D * = ∑ n = 1 N + 1 N n - 1 μ n - 1 ( 1 - μ ) N + 1 - n 1 + 1 n + ⋯ + 1 n K - 1 . We show that uniform and random caching scheme which is originally proposed for decentralized coded caching by Maddah-Ali and Niesen, along with Sun and Jafar retrieval scheme which is originally proposed for PIR from replicated databases surprisingly results in the lowest normalized download cost. This is the decentralized counterpart of the recent result of Attia, Kumar, and Tandon for the centralized case. The converse proof contains several ingredients such as interference lower bound, induction lemma, replacing queries and answering string random variables with the content of distributed databases, the nature of decentralized uncoded caching databases, and bit marginalization of joint caching distributions. 
    more » « less
  5. Multiple known algorithmic paradigms (backtracking, local search and the polynomial method) only yield a 2n(1−1/O(k)) time algorithm for k-SAT in the worst case. For this reason, it has been hypothesized that the worst-case k-SAT problem cannot be solved in 2n(1−f(k)/k) time for any unbounded function f. This hypothesis has been called the “Super-Strong ETH”, modeled after the ETH and the Strong ETH. We give two results on the Super-Strong ETH: 1. It has also been hypothesized that k-SAT is hard to solve for randomly chosen instances near the “critical threshold”, where the clause-to-variable ratio is 2^kln2−Θ(1). We give a randomized algorithm which refutes the Super-Strong ETH for the case of random k-SAT and planted k-SAT for any clause-to-variable ratio. For example, given any random k-SAT instance F with n variables and m clauses, our algorithm decides satisfiability for F in 2^n(1−Ω(logk)/k) time, with high probability (over the choice of the formula and the randomness of the algorithm). It turns out that a well-known algorithm from the literature on SAT algorithms does the job: the PPZ algorithm of Paturi, Pudlák and Zane [17]. 2. The Unique k-SAT problem is the special case where there is at most one satisfying assignment. Improving prior reductions, we show that the Super-Strong ETHs for Unique k-SAT and k-SAT are equivalent. More precisely, we show the time complexities of Unique k-SAT and k-SAT are very tightly correlated: if Unique k-SAT is in 2^n(1−f(k)/k) time for an unbounded f, then k-SAT is in 2^n(1−f(k)(1−ε)/k) time for every ε>0. 
    more » « less