We study common randomness where two parties have access to i.i.d. samples from a known random source, and wish to generate a shared random key using limited (or no) communication with the largest possible probability of agreement. This problem is at the core of secret key generation in cryptography, with connections to communication under uncertainty and locality sensitive hashing. We take the approach of treating correlated sources as a critical resource, and ask whether common randomness can be generated resource-efficiently. We consider two notable sources in this setup arising from correlated bits and correlated Gaussians. We design the first explicit schemes that use only a polynomial number of samples (in the key length) so that the players can generate shared keys that agree with constant probability using optimal communication. The best previously known schemes were both non-constructive and used an exponential number of samples. In the amortized setting, we characterize the largest achievable ratio of key length to communication in terms of the external and internal information costs, two well-studied quantities in theoretical computer science. In the relaxed setting where the two parties merely wish to improve the correlation between the generated keys of length k, we show that theremore »
Compression in a Distributed Setting
Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution P and some constant number of bits of information (such as a learning algorithm). Letting T_eps denote the number of iterations it would take for a typical player to obtain an eps-approximation to Q in total variation distance, we ask whether T_eps iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer.
We show that a natural uniform algorithm can compress the communication down to an average cost per message of O(H(Q) + log (D(P || Q) + O(1)) in $\tilde{O}(T_eps)$ iterations while allowing for O(eps)-error, where D(. || .) denotes the KL-divergence more »
- Publication Date:
- NSF-PAR ID:
- 10026314
- Journal Name:
- Innovations in Theoretical Computer Science (ITCS)
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a d-dimensional Gaussian distribution, an \eps-fraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.
-
We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent error guarantees for several families of structured distributions. In this work, we give the first nearly-linear time algorithms for high-dimensional robust mean estimation. Specifically, we focus on distributions with (i) known covariance and sub-gaussian tails, and (ii) unknown bounded covariance. Given N samples on R^d, an \eps-fraction of which may be arbitrarily corrupted, our algorithms run in time eO(Nd)/poly(\eps) and approximate the true mean within the information-theoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times \Omega(Nd^2), for \eps= O(1) Our algorithms rely on a natural family of SDPs parameterized by our current guess ν for the unknown mean μ. We give a win-win analysis establishing the following: either a near-optimal solution to the primal SDP yields a good candidate for μ — independent of our current guess ν — or a near-optimal solution to the dual SDP yields a new guess ν0 whose distance from μ is smaller by a constant factor. We exploit the specialmore »
-
We present decidability results for a sub-class of “non-interactive” simulation problems, a well-studied class of problems in information theory. A non-interactive simulation problem is specified by two distributions P(x, y) and Q(u, v): The goal is to determine if two players, Alice and Bob, that observe sequences Xn and Y n respectively where {(Xi, Yi)}n i=1 are drawn i.i.d. from P(x, y) can generate pairs U and V respectively (without communicating with each other) with a joint distribution that is arbitrarily close in total variation to Q(u, v). Even when P and Q are extremely simple: e.g., P is uniform on the triples {(0, 0), (0, 1), (1, 0)} and Q is a “doubly symmetric binary source”, i.e., U and V are uniform ±1 variables with correlation say 0.49, it is open if P can simulate Q. In this work, we show that whenever P is a distribution on a finite domain and Q is a 2 × 2 distribution, then the non-interactive simulation problem is decidable: specifically, given δ > 0 the algorithm runs in time bounded by some function of P and δ and either gives a non-interactive simulation protocol that is δ-close to Q or asserts thatmore »
-
Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "on-the-fly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries inmore »