skip to main content

This content will become publicly available on July 6, 2022

Title: Construction of Differentially Private Empirical Distributions from a Low-Order Marginals Set Through Solving Linear Equations with 𝑙2 Regularization
We introduce a new algorithm, Construction of dIfferentially Private Empirical Distributions from a low-order marginal set tHrough solving linear Equations with 𝑙2 Regularization (CIPHER), that produces differentially private empirical joint distributions from a set of low-order marginals. CIPHER is conceptually simple and requires no more than decomposing joint probabilities via basic probability rules to construct a linear equation set and subsequently solve the equations. Compared to the full-dimensional histogram (FDH) sanitization, CIPHER has drastically lower requirements on computational storage and memory, which is practically attractive especially considering that the high-order signals preserved by the FDH sanitization are likely just sample randomness and rarely of interest. Our experiments demonstrate that CIPHER outperforms the multiplicative weighting exponential mechanism in preserving original information and has similar or superior cost-normalized utility to FDH sanitization at the same privacy budget.
Authors:
;
Award ID(s):
1717417 1546373
Publication Date:
NSF-PAR ID:
10311803
Journal Name:
Intelligent Computing: Proceedings of the 2021 Computing Conference, Volume 3
Volume:
3
Sponsoring Org:
National Science Foundation
More Like this
  1. Protection of individual privacy is a common concern when releasing and sharing data and information. Differential privacy (DP) formalizes privacy in probabilistic terms without making assumptions about the background knowledge of data intruders, and thus provides a robust concept for privacy protection. Practical applications of DP involve development of differentially private mechanisms to generate sanitized results at a pre-specified privacy budget. For the sanitization of statistics with publicly known bounds such as proportions and correlation coefficients, the bounding constraints will need to be incorporated in the differentially private mechanisms. There has been little work on examining the consequences of themore »bounding constraints on the accuracy of sanitized results and the statistical inferences of the population parameters based on the sanitized results. In this paper, we formalize the differentially private truncated and boundary inflated truncated (BIT) procedures for releasing statistics with publicly known bounding constraints. The impacts of the truncated and BIT Laplace procedures on the statistical accuracy and validity of sanitized statistics are evaluated both theoretically and empirically via simulation studies.« less
  2. Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy.We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHGmore »offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.« less
  3. During the past decade, differential privacy has become the gold standard for protecting the privacy of individuals. However, verifying that a particular program provides differential privacy often remains a manual task to be completed by an expert in the field. Language-based techniques have been proposed for fully automating proofs of differential privacy via type system design, however these results have lagged behind advances in differentially-private algorithms, leaving a noticeable gap in programs which can be automatically verified while also providing state-of-the-art bounds on privacy. We propose Duet, an expressive higher-order language, linear type system and tool for automatically verifying differentialmore »privacy of general-purpose higher-order programs. In addition to general purpose programming, Duet supports encoding machine learning algorithms such as stochastic gradient descent, as well as common auxiliary data analysis tasks such as clipping, normalization and hyperparameter tuning - each of which are particularly challenging to encode in a statically verified differential privacy framework. We present a core design of the Duet language and linear type system, and complete key proofs about privacy for well-typed programs. We then show how to extend Duet to support realistic machine learning applications and recent variants of differential privacy which result in improved accuracy for many practical differentially private algorithms. Finally, we implement several differentially private machine learning algorithms in Duet which have never before been automatically verified by a language-based tool, and we present experimental results which demonstrate the benefits of Duet's language design in terms of accuracy of trained machine learning models.« less
  4. Many social networks contain sensitive relational information. One approach to protect the sensitive relational information while offering flexibility for social network research and analysis is to release synthetic social networks at a pre-specified privacy risk level, given the original observed network. We propose the DP-ERGM procedure that synthesizes networks that satisfy the differential privacy (DP) via the exponential random graph model (EGRM). We apply DP-ERGM to a college student friendship network and compare its original network information preservation in the generated private networks with two other approaches: differentially private DyadWise Randomized Response (DWRR) and Sanitization of the Conditional probability ofmore »Edge given Attribute classes (SCEA). The results suggest that DP-EGRM preserves the original information significantly better than DWRR and SCEA in both network statistics and inferences from ERGMs and latent space models. In addition, DP-ERGM satisfies the node DP, a stronger notion of privacy than the edge DP that DWRR and SCEA satisfy.« less
  5. The ANSI X9.17/X9.31 pseudorandom number generator design was first standardized in 1985, with variants incorporated into numerous cryptographic standards over the next three decades. The design uses timestamps together with a statically keyed block cipher to produce pseudo-random output. It has been known since 1998 that the key must remain secret in order for the output to be secure. However, neither the FIPS 140-2 standardization process nor NIST's later descriptions of the algorithm specified any process for key generation. We performed a systematic study of publicly available FIPS 140- 2 certifications for hundreds of products that implemented the ANSI X9.31more »random number generator, and found twelve whose certification documents use of static, hard-coded keys in source code, leaving the implementation vulnerable to an attacker who can learn this key from the source code or binary. In order to demonstrate the practicality of such an attack, we develop a full passive decryption attack against FortiGate VPN gateway products using FortiOS v4 that recovers the private key in seconds. We measure the prevalence of this vulnerability on the visible Internet using active scans, and demonstrate state recovery and full private key recovery in the wild. Our work highlights the extent to which the validation and certification process has failed to provide even modest security guarantees.« less