skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Graphical-model based estimation and inference for differential privacy
Many privacy mechanisms reveal high-level information about a data distribution through noisy measurements. It is common to use this information to estimate the answers to new queries. In this work, we provide an approach to solve this estimation problem efficiently using graphical models, which is particularly effective when the distribution is high-dimensional but the measurements are over low-dimensional marginals. We show that our approach is far more efficient than existing estimation techniques from the privacy literature and that it can improve the accuracy and scalability of many state-of-the-art mechanisms.  more » « less
Award ID(s):
1749854
PAR ID:
10199764
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
97
ISSN:
2640-3498
Page Range / eLocation ID:
4435-4444
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many differentially private algorithms for answering database queries involve a step that reconstructs a discrete data distribution from noisy measurements. This provides consistent query answers and reduces error, but often requires space that grows exponentially with dimension. Private-PGM is a recent approach that uses graphical models to represent the data distribution, with complexity proportional to that of exact marginal inference in a graphical model with structure determined by the co-occurrence of variables in the noisy measurements. Private-PGM is highly scalable for sparse measurements, but may fail to run in high dimensions with dense measurements. We overcome the main scalability limitation of Private-PGM through a principled approach that relaxes consistency constraints in the estimation objective. Our new approach works with many existing private query answering algorithms and improves scalability or accuracy with no privacy cost. 
    more » « less
  2. We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation. 
    more » « less
  3. The rapid pace of urbanization and socioeconomic development encourage people to spend more time together and therefore monitoring of human dynamics is of great importance, especially for facilities of elder care and involving multiple activities. Traditional approaches are limited due to their high deployment costs and privacy concerns (e.g., camera-based surveillance or sensor-attachment-based solutions). In this work, we propose to provide a fine-grained comprehensive view of human dynamics using existing WiFi infrastructures often available in many indoor venues. Our approach is low-cost and device-free, which does not require any active human participation. Our system aims to provide smart human dynamics monitoring through participant number estimation, human density estimation and walking speed and direction derivation. A semi-supervised learning approach leveraging the non-linear regression model is developed to significantly reduce training efforts and accommodate different monitoring environments. We further derive participant number and density estimation based on the statistical distribution of Channel State Information (CSI) measurements. In addition, people's walking speed and direction are estimated by using a frequency-based mechanism. Extensive experiments over 12 months demonstrate that our system can perform fine-grained effective human dynamic monitoring with over 90% accuracy in estimating participants number, density, and walking speed and direction at various indoor environments. 
    more » « less
  4. Differential privacy has emerged as a leading theoretical framework for privacy-preserving data gathering and analysis. It allows meaningful statistics to be collected for a population without revealing ``too much'' information about any individual member of the population. For software profiling, this machinery allows profiling data from many users of a deployed software system to be collected and analyzed in a privacy-preserving manner. Such a solution is appealing to many stakeholders, including software users, software developers, infrastructure providers, and government agencies. We propose an approach for differentially-private collection of frequency vectors from software executions. Frequency information is reported with the addition of random noise drawn from the Laplace distribution. A key observation behind the design of our scheme is that event frequencies are closely correlated due to the static code structure. Differential privacy protections must account for such relationships; otherwise, a seemingly-strong privacy guarantee is actually weaker than it appears. Motivated by this observation, we propose a novel and general differentially-private profiling scheme when correlations between frequencies can be expressed through linear inequalities. Using a linear programming formulation, we show how to determine the magnitude of random noise that should be added to achieve meaningful privacy protections under such linear constraints. Next, we develop an efficient instance of this general machinery for an important subclass of constraints. Instead of LP, our solution uses a reachability analysis of a constraint graph. As an exemplar, we employ this approach to implement differentially-private method frequency profiling for Android apps. Any differentially-private scheme has to balance two competing aspects: privacy and accuracy. Through an experimental study to characterize these trade-offs, we (1) show that our proposed randomization achieves much higher accuracy compared to related prior work, (2) demonstrate that high accuracy and high privacy protection can be achieved simultaneously, and (3) highlight the importance of linear constraints in the design of the randomization. These promising results provide evidence that our approach is a good candidate for privacy-preserving frequency profiling of deployed software. 
    more » « less
  5. Bun, Mark (Ed.)
    Given a differentially private unbiased estimate q̃ = q(D) +ν of a statistic q(D), we wish to obtain unbiased estimates of functions of q(D), such as 1/q(D), solely through post-processing of q̃, with no further access to the confidential dataset D. To this end, we adapt the deconvolution method used for unbiased estimation in the statistical literature, deriving unbiased estimators for a broad family of twice-differentiable functions - those that are tempered distributions - when the privacy-preserving noise ν is drawn from the Laplace distribution (Dwork et al., 2006). We further extend this technique to functions other than tempered distributions, deriving approximately optimal estimators that are unbiased for values in a user-specified interval (possibly extending to ± ∞). We use these results to derive an unbiased estimator for private means when the size n of the dataset is not publicly known. In a numerical application, we find that a mechanism that uses our estimator to return an unbiased sample size and mean outperforms a mechanism that instead uses the previously known unbiased privacy mechanism for such means (Kamath et al., 2023). We also apply our estimators to develop unbiased transformation mechanisms for per-record differential privacy, a privacy concept in which the privacy guarantee is a public function of a record’s value (Seeman et al., 2024). Our mechanisms provide stronger privacy guarantees than those in prior work (Finley et al., 2024) by using Laplace, rather than Gaussian, noise. Finally, using a different approach, we go beyond Laplace noise by deriving unbiased estimators for polynomials under the weak condition that the noise distribution has sufficiently many moments. 
    more » « less