skip to main content


Search for: All records

Creators/Authors contains: "Ghazi, Badih"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
  2. We study the role of interaction in the Common Randomness Generation (CRG) and Secret Key Generation (SKG) problems. In the CRG problem, two players, Alice and Bob, respectively get samples X1, X2, . . . and Y1, Y2, . . . with the pairs (X1, Y1), (X2, Y2), . . . being drawn independently from some known probability distribution μ. They wish to communicate so as to agree on L bits of randomness. The SKG problem is the restriction of the CRG problem to the case where the key is required to be close to random even to an eavesdropper who can listen to their communication (but does not have access to the inputs of Alice and Bob). In this work, we study the relationship between the amount of communication and the number of rounds of interaction in both the CRG and the SKG problems. Specifically, we construct a family of distributions μ = μr,n,L, parametrized by integers r, n and L, such that for every r there exists a constant b = b(r) for which CRG (respectively SKG) is feasible when (Xi, Yi) ~ μr,n,L with r + 1 rounds of communication, each consisting of O(log n) bits, but when restricted to r/2 − 2 rounds of interaction, the total communication must exceed Ω(n/ logb(n)) bits. Prior to our work no separations were known for r ≥ 2. 
    more » « less
  3. We introduce a new technique for reducing the dimension of the ambient space of low-degree polynomials in the Gaussian space while preserving their relative correlation structure. As an application, we obtain an explicit upper bound on the dimension of an epsilon-optimal noise-stable Gaussian partition. In fact, we address the more general problem of upper bounding the number of samples needed to epsilon-approximate any joint distribution that can be non-interactively simulated from a correlated Gaussian source. Our results significantly improve (from Ackermann-like to "merely" exponential) the upper bounds recently proved on the above problems by De, Mossel & Neeman [CCC 2017, SODA 2018 resp.] and imply decidability of the larger alphabet case of the gap non-interactive simulation problem posed by Ghazi, Kamath & Sudan [FOCS 2016]. Our technique of dimension reduction for low-degree polynomials is simple and can be seen as a generalization of the Johnson-Lindenstrauss lemma and could be of independent interest. 
    more » « less
  4. We study common randomness where two parties have access to i.i.d. samples from a known random source, and wish to generate a shared random key using limited (or no) communication with the largest possible probability of agreement. This problem is at the core of secret key generation in cryptography, with connections to communication under uncertainty and locality sensitive hashing. We take the approach of treating correlated sources as a critical resource, and ask whether common randomness can be generated resource-efficiently. We consider two notable sources in this setup arising from correlated bits and correlated Gaussians. We design the first explicit schemes that use only a polynomial number of samples (in the key length) so that the players can generate shared keys that agree with constant probability using optimal communication. The best previously known schemes were both non-constructive and used an exponential number of samples. In the amortized setting, we characterize the largest achievable ratio of key length to communication in terms of the external and internal information costs, two well-studied quantities in theoretical computer science. In the relaxed setting where the two parties merely wish to improve the correlation between the generated keys of length k, we show that there are no interactive protocols using o(k) bits of communication having agreement probability even as small as 2–o(k). For the related communication problem where the players wish to compute a joint function f of their inputs using i.i.d samples from a known source, we give a simultaneous message passing protocol using 2O(c) bits where c is the interactive randomized public-coin communication complexity of f. This matches the lower bound shown previously while the best previously known upper bound was doubly exponential in c. Our schemes reveal a new connection between common randomness and unbiased error-correcting codes, e.g., dual-BCH codes and their analogues in Euclidean space. Read More: https://epubs.siam.org/doi/10.1137/1.9781611975031.120 
    more » « less
  5. Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of $\E_{x\sim D}[q(x)]$ for any choice of the query function $q:X\rightarrow \R$, where $D$ is an unknown data distribution.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using $k$-wise queries each of which is specified by a function $q:X^k\rightarrow \R$. Hence it is natural to ask whether algorithms using $k$-wise queries can solve learning problems more efficiently and by how much. Blum, Kalai, Wasserman~\cite{blum2003noise} showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with $k$-wise SQs is smaller than the (unary) SQ complexity by a factor of at most $2^k$. We show that for more general problems over distributions the picture is substantially richer. For every $k$, the complexity of distribution-independent PAC learning with $k$-wise queries can be exponentially larger than learning with $(k+1)$-wise queries. We then give two approaches for simulating a $k$-wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum \etal \cite{blum2003noise}. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using $k$-wise queries. The second approach exploits the $k$-party communication complexity of the $k$-wise query function. 
    more » « less
  6. Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution P and some constant number of bits of information (such as a learning algorithm). Letting T_eps denote the number of iterations it would take for a typical player to obtain an eps-approximation to Q in total variation distance, we ask whether T_eps iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer. We show that a natural uniform algorithm can compress the communication down to an average cost per message of O(H(Q) + log (D(P || Q) + O(1)) in $\tilde{O}(T_eps)$ iterations while allowing for O(eps)-error, where D(. || .) denotes the KL-divergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to H(Q) + D(P || Q) bits, while not requiring (T_eps . K) iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "data-structural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of Q and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning. 
    more » « less
  7. We present decidability results for a sub-class of “non-interactive” simulation problems, a well-studied class of problems in information theory. A non-interactive simulation problem is specified by two distributions P(x, y) and Q(u, v): The goal is to determine if two players, Alice and Bob, that observe sequences Xn and Y n respectively where {(Xi, Yi)}n i=1 are drawn i.i.d. from P(x, y) can generate pairs U and V respectively (without communicating with each other) with a joint distribution that is arbitrarily close in total variation to Q(u, v). Even when P and Q are extremely simple: e.g., P is uniform on the triples {(0, 0), (0, 1), (1, 0)} and Q is a “doubly symmetric binary source”, i.e., U and V are uniform ±1 variables with correlation say 0.49, it is open if P can simulate Q. In this work, we show that whenever P is a distribution on a finite domain and Q is a 2 × 2 distribution, then the non-interactive simulation problem is decidable: specifically, given δ > 0 the algorithm runs in time bounded by some function of P and δ and either gives a non-interactive simulation protocol that is δ-close to Q or asserts that no protocol gets O(δ)-close to Q. The main challenge to such a result is determining explicit (computable) convergence bounds on the number n of samples that need to be drawn from P(x, y) to get δ-close to Q. We invoke contemporary results from the analysis of Boolean functions such as the invariance principle and a regularity lemma to obtain such explicit bounds. 
    more » « less
  8. Establishing the complexity of {\em Bounded Distance Decoding} for Reed-Solomon codes is a fundamental open problem in coding theory, explicitly asked by Guruswami and Vardy (IEEE Trans. Inf. Theory, 2005). The problem is motivated by the large current gap between the regime when it is NP-hard, and the regime when it is efficiently solvable (i.e., the Johnson radius). We show the first NP-hardness results for asymptotically smaller decoding radii than the maximum likelihood decoding radius of Guruswami and Vardy. Specifically, for Reed-Solomon codes of length $N$ and dimension $K=O(N)$, we show that it is NP-hard to decode more than $ N-K- c\frac{\log N}{\log\log N}$ errors (with $c>0$ an absolute constant). Moreover, we show that the problem is NP-hard under quasipolynomial-time reductions for an error amount $> N-K- c\log{N}$ (with $c>0$ an absolute constant). An alternative natural reformulation of the Bounded Distance Decoding problem for Reed-Solomon codes is as a {\em Polynomial Reconstruction} problem. In this view, our results show that it is NP-hard to decide whether there exists a degree $K$ polynomial passing through $K+ c\frac{\log N}{\log\log N}$ points from a given set of points $(a_1, b_1), (a_2, b_2)\ldots, (a_N, b_N)$. Furthermore, it is NP-hard under quasipolynomial-time reductions to decide whether there is a degree $K$ polynomial passing through $K+c\log{N}$ many points. These results follow from the NP-hardness of a generalization of the classical Subset Sum problem to higher moments, called {\em Moments Subset Sum}, which has been a known open problem, and which may be of independent interest. We further reveal a strong connection with the well-studied Prouhet-Tarry-Escott problem in Number Theory, which turns out to capture a main barrier in extending our techniques. We believe the Prouhet-Tarry-Escott problem deserves further study in the theoretical computer science community. 
    more » « less