A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

Li, Bai; Karwa, Vishesh; Slavković, Aleksandra; Steorts, Rebecca Carter

doi:10.29012/jpc.657

Citation Details

A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy.We develop an $$(\epsilon, \delta)$$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset. more »

Award ID(s):: 1947919

PAR ID:: 10125680

Author(s) / Creator(s):: Li, Bai; Karwa, Vishesh; Slavković, Aleksandra; Steorts, Rebecca Carter

Date Published:: 2018-12-29

Journal Name:: Journal of Privacy and Confidentiality

Volume:: 8

Issue:: 1

ISSN:: 2575-8527

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.29012/jpc.657

More Like this