NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reconstruction Attacks on Aggressive Relaxations of Differential Privacy

https://doi.org/10.29012/jpc.871

Protivash, Prottay; Durrell, John; Kifer, Daniel; Ding, Zeyu; Zhang, Danfeng (August 2024, Journal of Privacy and Confidentiality)

Differential privacy is a widely accepted formal privacy definition that allows aggregate information about a dataset to be released while controlling privacy leakage for individuals whose records appear in the data. Due to the unavoidable tension between privacy and utility, there have been many works trying to relax the requirements of differential privacy to achieve greater utility.One class of relaxation, which is gaining support outside the privacy community is embodied by the definitions of individual differential privacy (IDP) and bootstrap differential privacy (BDP). Classical differential privacy defines a set of neighboring database pairs and achieves its privacy guarantees by requiring that each pair of neighbors should be nearly indistinguishable to an attacker. The privacy definitions we study, however, aggressively reduce the set of neighboring pairs that are protected.To a non-expert, IDP and BDP can seem very appealing as they echo the same types of privacy explanations that are associated with differential privacy, and also experimentally achieve dramatically better utility. However, we show that they allow a significant portion of the dataset to be reconstructed using algorithms that have arbitrarily low privacy loss under their privacy accounting rules.With the non-expert in mind, we demonstrate these attacks using the preferred mechanisms of these privacy definitions. In particular, we design a set of queries that, when protected by these mechanisms with high noise settings (i.e., with claims of very low privacy loss), yield more precise information about the dataset than if they were not protected at all. The specific attacks here can be defeated and we give examples of countermeasures. However, the defenses are either equivalent to using differential privacy or to ad-hoc methods tailored specifically to the attack (with no guarantee that they protect against other attacks). Thus, the defenses emphasize the deficiencies of these privacy definitions.
more » « less
Full Text Available
Answering Private Linear Queries Adaptively Using the Common Mechanism

https://doi.org/10.14778/3594512.3594519

Xiao, Yingtai; Wang, Guanhong; Zhang, Danfeng; Kifer, Daniel (April 2023, Proceedings of the VLDB Endowment)

When analyzing confidential data through a privacy filter, a data scientist often needs to decide which queries will best support their intended analysis. For example, an analyst may wish to study noisy two-way marginals in a dataset produced by a mechanism M 1 . But, if the data are relatively sparse, the analyst may choose to examine noisy one-way marginals, produced by a mechanism M 2 , instead. Since the choice of whether to use M 1 or M 2 is data-dependent, a typical differentially private workflow is to first split the privacy loss budget ρ into two parts: ρ 1 and ρ 2 , then use the first part ρ 1 to determine which mechanism to use, and the remainder ρ 2 to obtain noisy answers from the chosen mechanism. In a sense, the first step seems wasteful because it takes away part of the privacy loss budget that could have been used to make the query answers more accurate. In this paper, we consider the question of whether the choice between M 1 and M 2 can be performed without wasting any privacy loss budget. For linear queries, we propose a method for decomposing M 1 and M 2 into three parts: (1) a mechanism M * that captures their shared information, (2) a mechanism M′1 that captures information that is specific to M 1 , (3) a mechanism M′2 that captures information that is specific to M 2 . Running M * and M′ 1 together is completely equivalent to running M 1 (both in terms of query answer accuracy and total privacy cost ρ ). Similarly, running M * and M′ 2 together is completely equivalent to running M 2 . Since M * will be used no matter what, the analyst can use its output to decide whether to subsequently run M ′ 1 (thus recreating the analysis supported by M 1 )or M′ 2 (recreating the analysis supported by M 2 ), without wasting privacy loss budget.
more » « less
Full Text Available
Free gap estimates from the exponential mechanism, sparse vector, noisy max and related algorithms

https://doi.org/10.1007/s00778-022-00728-2

Ding, Zeyu; Wang, Yuxin; Xiao, Yingtai; Wang, Guanhong; Zhang, Danfeng; Kifer, Daniel (February 2022, The VLDB Journal)

Full Text Available
DPGen: Automated Program Synthesis for Differential Privacy

https://doi.org/10.1145/3460120.3484781

Wang, Yuxin; Ding, Zeyu; Xiao, Yingtai; Kifer, Daniel; Zhang, Danfeng (November 2021, Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security)

Full Text Available
Optimizing fitness-for-use of differentially private linear queries

https://doi.org/10.14778/3467861.3467864

Xiao, Yingtai; Ding, Zeyu; Wang, Yuxin; Zhang, Danfeng; Kifer, Daniel (June 2021, Proceedings of the VLDB Endowment)
null (Ed.)
In practice, differentially private data releases are designed to support a variety of applications. A data release is fit for use if it meets target accuracy requirements for each application. In this paper, we consider the problem of answering linear queries under differential privacy subject to per-query accuracy constraints. Existing practical frameworks like the matrix mechanism do not provide such fine-grained control (they optimize total error, which allows some query answers to be more accurate than necessary, at the expense of other queries that become no longer useful). Thus, we design a fitness-for-use strategy that adds privacy-preserving Gaussian noise to query answers. The covariance structure of the noise is optimized to meet the fine-grained accuracy requirements while minimizing the cost to privacy.
more » « less
Full Text Available
DPGen: Automated Program Synthesis for Differential Privacy

Wang, Yuxin; Ding, Zeyu; Xiao, Yingtai; Kifer, Daniel; Zhang, Danfeng (January 2021, The ACM Conference on Computer and Communications Security (CCS))
null (Ed.)
Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (nonprivate) algorithm. The designer then decides where to add noise, and how much of it to add. This can be a non-trivial process – if not done carefully, the algorithm might either violate differential privacy or have low utility. In this paper, we present DPGen, a program synthesizer that takes in non-private code (without any noise) and automatically synthesizes its differentially private version (with carefully calibrated noise). Under the hood, DPGen uses novel algorithms to automatically generate a sketch program with candidate locations for noise, and then optimize privacy proof and noise scales simultaneously on the sketch program. Moreover, DPGen can synthesize sophisticated mechanisms that adaptively process queries until a specified privacy budget is exhausted. When evaluated on standard benchmarks, DPGen is able to generate differentially private mechanisms that optimize simple utility functions within 120 seconds. It is also powerful enough to synthesize adaptive privacy mechanisms.
more » « less
Full Text Available
CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples

https://doi.org/10.1145/3372297.3417282

Wang, Yuxin; Ding, Zeyu; Kifer, Daniel; Zhang, Danfeng (October 2020, CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security)
null (Ed.)
We propose CheckDP, an automated and integrated approach for proving or disproving claims that a mechanism is differentially private. CheckDP can find counterexamples for mechanisms with subtle bugs for which prior counterexample generators have failed. Furthermore, it was able to automatically generate proofs for correct mechanisms for which no formal verification was reported before. CheckDP is built on static program analysis, allowing it to be more efficient and precise in catching infrequent events than sampling based counterexample generators (which run mechanisms hundreds of thousands of times to estimate their output distribution). Moreover, its sound approach also allows automatic verification of correct mechanisms. When evaluated on standard benchmarks and newer privacy mechanisms, CheckDP generates proofs (for correct mechanisms) and counterexamples (for incorrect mechanisms) within 70 seconds without any false positives or false negatives.
more » « less
Full Text Available
Free gap information from the differentially private sparse vector and noisy max mechanisms

https://doi.org/10.14778/3368289.3368295

Ding, Zeyu; Wang, Yuxin; Zhang, Danfeng; Kifer, Daniel (November 2019, Proceedings of the VLDB Endowment)
null (Ed.)
Noisy Max and Sparse Vector are selection algorithms for differential privacy and serve as building blocks for more complex algorithms. In this paper we show that both algorithms can release additional information for free (i.e., at no additional privacy cost). Noisy Max is used to return the approximate maximizer among a set of queries. We show that it can also release for free the noisy gap between the approximate maximizer and runner-up. This free information can improve the accuracy of certain subsequent counting queries by up to 50%. Sparse Vector is used to return a set of queries that are approximately larger than a fixed threshold. We show that it can adaptively control its privacy budget (use less budget for queries that are likely to be much larger than the threshold) in order to increase the amount of queries it can process. These results follow from a careful privacy analysis.
more » « less
Full Text Available
Differentially Private Confidence Intervals for Empirical Risk Minimization

https://doi.org/10.29012/jpc.660

Wang, Yue; Kifer, Daniel; Lee, Jaewoo (March 2019, Journal of Privacy and Confidentiality)

The process of data mining with differential privacy produces results that are affected by two types of noise: sampling noise due to data collection and privacy noise that is designed to prevent the reconstruction of sensitive information. In this paper, we consider the problem of designing confidence intervals for the parameters of a variety of differentially private machine learning models. The algorithms can provide confidence intervals that satisfy differential privacy (as well as the more recently proposed concentrated differential privacy) and can be used with existing differentially private mechanisms that train models using objective perturbation and output perturbation.
more » « less
Full Text Available
A Simple Baseline for Travel Time Estimation using Large-scale Trip Data

https://doi.org/10.1145/3293317

Wang, Hongjian; Tang, Xianfeng; Kuo, Yu-Hsuan; Kifer, Daniel; Li, Zhenhui (January 2019, ACM Transactions on Intelligent Systems and Technology)

Full Text Available

« Prev Next »

Search for: All records