skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Sun, Ziteng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Ruiz, Francisco ; Dy, Jennifer ; van de Meent, Jan-Willem (Ed.)
    We study discrete distribution estimation under user-level local differential privacy (LDP). In user-level $\varepsilon$-LDP, each user has $m\ge1$ samples and the privacy of all $m$ samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all $m$ samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having $m$ samples per user is equivalent to having $m$ times more users, each with only one sample. Our results demonstrate interesting phase transitions for $m$ and the privacy parameter $\varepsilon$ in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of user-level DP in certain parameter regimes. We provide several simulations to verify our theoretical findings. 
    more » « less
    Free, publicly-accessible full text available April 14, 2024
  2. Ruiz, Francisco ; Dy, Jennifer ; van de Meent, Jan-Willem (Ed.)
    We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables X and Y, we consider the classical scenario where either X causes Y , Y causes X, or there is an unmeasured confounder between X and Y. We show that if X and Y are over a finite domain of size k and are significantly correlated, the minimum number of interventional samples needed is sublinear in k. We give a tight characterization of the tradeoff between observational and interventional data when the number of observational samples is sufficiently large. We build upon techniques for closeness testing and for non-parametric density estimation in different regimes of observational data. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing. 
    more » « less
    Free, publicly-accessible full text available April 14, 2024
  3. Loh, Po-Ling ; Raginsky, Maxim (Ed.)
  4. We obtain tight minimax rates for the problem of distributed estimation of discrete distributions under communication constraints, where n users observing m samples each can broadcast only ℓ bits. Our main result is a tight characterization (up to logarithmic factors) of the error rate as a function of m, ℓ, the domain size, and the number of users under most regimes of interest. 
    more » « less
  5. We study robust testing and estimation of discrete distributions in the strong contamination model. Our results cover both centralized setting and distributed setting with general local information constraints including communication and LDP constraints. Our technique relates the strength of manipulation attacks to the earth-mover distance using Hamming distance as the metric between messages (samples) from the users. In the centralized setting, we provide optimal error bounds for both learning and testing. Our lower bounds under local information constraints build on the recent lower bound methods in distributed inference. In the communication constrained setting, we develop novel algorithms based on random hashing and an L1-L1 isometry. 
    more » « less
  6. We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor, and the sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a private coin scheme that requires only one bit of communication per user. Under communication constraints we propose public coin schemes based on random hashing functions. Our tight lower bounds are based on recently proposed method of chi squared contractions. 
    more » « less
  7. Ranzato, M ; Beygelzimer, A ; Dauphin, Y ; Liang, P. S. ; Wortman Vaughan, J (Ed.)
  8. We develop differentially private methods for estimating various distributional properties. Given a sample from a discrete distribution p, some functional f, and accuracy and privacy parameters alpha and epsilon, the goal is to estimate f(p) up to accuracy alpha, while maintaining epsilon-differential privacy of the sample. We prove almost-tight bounds on the sample size required for this problem for several functionals of interest, including support size, support coverage, and entropy. We show that the cost of privacy is negligible in a variety of settings, both theoretically and experimentally. Our methods are based on a sensitivity analysis of several state-of-the-art methods for estimating these properties with sublinear sample complexities 
    more » « less