skip to main content

Title: Estimating Sparse Discrete Distributions Under Privacy and Communication Constraints
We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor, and the sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a private coin scheme that requires only one bit of communication per user. Under communication constraints we propose public coin schemes based on random hashing functions. Our tight lower bounds are based on recently proposed method of chi squared contractions.  more » « less
Award ID(s):
1815893 1846300 1657471
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and Kothari initiated the study of communication with contextual uncertainty, a setup aiming to understand how efficient communication is possible when the communicating parties imperfectly share a huge context. In this setting, Alice is given a function f and an input string x, and Bob is given a function g and an input string y. The pair (x,y) comes from a known distribution mu and f and g are guaranteed to be close under this distribution. Alice and Bob wish to compute g(x,y) with high probability. The lack of agreement between Alice and Bob on the function that is being computed captures the uncertainty in the context. The previous work showed that any problem with one-way communication complexity k in the standard model (i.e., without uncertainty, in other words, under the promise that f=g) has public-coin communication at most O(k(1+I)) bits in the uncertain case, where I is the mutual information between x and y. Moreover, a lower bound of Omega(sqrt{I}) bits on the public-coin uncertain communication was also shown. However, an important question that was left open is related to the power that public randomness brings to uncertain communication. Can Alice and Bob achieve efficient communication amid uncertainty without using public randomness? And how powerful are public-coin protocols in overcoming uncertainty? Motivated by these two questions: - We prove the first separation between private-coin uncertain communication and public-coin uncertain communication. Namely, we exhibit a function class for which the communication in the standard model and the public-coin uncertain communication are O(1) while the private-coin uncertain communication is a growing function of n (the length of the inputs). This lower bound (proved with respect to the uniform distribution) is in sharp contrast with the case of public-coin uncertain communication which was shown by the previous work to be within a constant factor from the certain communication. This lower bound also implies the first separation between public-coin uncertain communication and deterministic uncertain communication. Interestingly, we also show that if Alice and Bob imperfectly share a sequence of random bits (a setup weaker than public randomness), then achieving a constant blow-up in communication is still possible. - We improve the lower-bound of the previous work on public-coin uncertain communication. Namely, we exhibit a function class and a distribution (with mutual information I approx n) for which the one-way certain communication is k bits but the one-way public-coin uncertain communication is at least Omega(sqrt{k}*sqrt{I}) bits. Our proofs introduce new problems in the standard communication complexity model and prove lower bounds for these problems. Both the problems and the lower bound techniques may be of general interest. 
    more » « less
  2. We study robust testing and estimation of discrete distributions in the strong contamination model. Our results cover both centralized setting and distributed setting with general local information constraints including communication and LDP constraints. Our technique relates the strength of manipulation attacks to the earth-mover distance using Hamming distance as the metric between messages (samples) from the users. In the centralized setting, we provide optimal error bounds for both learning and testing. Our lower bounds under local information constraints build on the recent lower bound methods in distributed inference. In the communication constrained setting, we develop novel algorithms based on random hashing and an L1-L1 isometry. 
    more » « less
  3. We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets. 
    more » « less
  4. Ruiz, Francisco ; Dy, Jennifer ; van de Meent, Jan-Willem (Ed.)
    We study discrete distribution estimation under user-level local differential privacy (LDP). In user-level $\varepsilon$-LDP, each user has $m\ge1$ samples and the privacy of all $m$ samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all $m$ samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having $m$ samples per user is equivalent to having $m$ times more users, each with only one sample. Our results demonstrate interesting phase transitions for $m$ and the privacy parameter $\varepsilon$ in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of user-level DP in certain parameter regimes. We provide several simulations to verify our theoretical findings. 
    more » « less
  5. SPRINGER (Ed.)
    In this work we study the problem of minimizing the round complexity for securely evaluating multiparty functionalities while making black-box use of polynomial time assumptions. In Eurocrypt 2016, Garg et al. showed that assuming all parties have access to a broadcast channel, then at least four rounds of communication are required to securely realize non-trivial functionalities in the plain model. A sequence of works follow-up the result of Garg et al. matching this lower bound under a variety of assumptions. Unfortunately, none of these works make black-box use of the underlying cryptographic primitives. In Crypto 2021, Ishai, Khurana, Sahai, and Srinivasan came closer to matching the four-round lower bound, obtaining a five-round protocol that makes black-box use of oblivious transfer and PKE with pseudorandom public keys. In this work, we show how to realize any input-less functionality (e.g., coin-tossing, generation of key-pairs, and so on) in four rounds while making black-box use of two-round oblivious transfer. As an additional result, we construct the first four-round MPC protocol for generic functionalities that makes black-box use of the underlying primitives, achieving security against non-aborting adversaries. Our protocols are based on a new primitive called list two-party computation. This primitive offers relaxed security compared to the standard notion of secure two-party computation. Despite this relaxation, we argue that this tool suffices for our applications. List two-party computation is of independent interest, as we argue it can also be used for the generation of setups, like oblivious transfer correlated randomness, in three rounds. Prior to our work, generating such a setup required at least four rounds of interactions or a trusted third party. 
    more » « less