Differential privacy (DP) is a widely used notion for reasoning about privacy when publishing aggregate data. In this paper, we observe that certain DP mechanisms are amenable to a posteriori privacy analysis that exploits the fact that some outputs leak less information about the input database than others. To exploit this phenomenon, we introduce output differential privacy (ODP) and a new composition experiment, and leverage these new constructs to obtain significant privacy budget savings and improved privacy–utility tradeoffs under composition. All of this comes at no cost in terms of privacy; we do not weaken the privacy guarantee. To demonstrate the applicability of our a posteriori privacy analysis techniques, we analyze two well-known mechanisms: the Sparse Vector Technique and the Propose-Test-Release framework. We then show how our techniques can be used to save privacy budget in more general contexts: when a differentially private iterative mechanism terminates before its maximal number of iterations is reached, and when the output of a DP mechanism provides unsatisfactory utility. Examples of the former include iterative optimization algorithms, whereas examples of the latter include training a machine learning model with a large generalization error. Our techniques can be applied beyond the current paper to refine the analysis of existing DP mechanisms or guide the design of future mechanisms.
more »
« less
Continuous Release of Data Streams under both Centralized and Local Differential Privacy
We study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value in the stream can be quite large, leading to enormous DP noise and bad utility. To reduce the maximal value and noise, one way is to estimate a threshold so that values above it can be truncated. The intuition is that, in many scenarios, only a few values are large; thus truncation does not change the original data much. We develop such a method that finds a suitable threshold with DP. Given the threshold, we then propose an online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps in a framework for the private publishing of streaming data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noise to the stream, and a smoother that improves the result using post-processing. Within our framework, we also design an algorithm satisfying the more stringent DP setting called local DP. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6−10 orders of magnitude in terms of utility (measured by the mean squared error of the typical scenario of answering a random range query).
more »
« less
- PAR ID:
- 10322945
- Editor(s):
- Kim, Yongdae; Kim, Jong; Vigna, Giovanni; Shi, Elaine
- Date Published:
- Journal Name:
- CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A large amount of high-dimensional and heterogeneous data appear in practical applications, which are often published to third parties for data analysis, recommendations, targeted advertising, and reliable predictions. However, publishing these data may disclose personal sensitive information, resulting in an increasing concern on privacy violations. Privacy-preserving data publishing has received considerable attention in recent years. Unfortunately, the differentially private publication of high dimensional data remains a challenging problem. In this paper, we propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases: a Markov-blanket-based attribute clustering phase and an invariant post randomization (PRAM) phase. Specifically, splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable allocation of privacy budget, while a double-perturbation mechanism satisfying local differential privacy facilitates an invariant PRAM to ensure no loss of statistical information and thus significantly preserves data utility. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy. We conduct extensive experiments on four real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data utility of the published data while satisfying differential privacy.more » « less
-
techniques to protect user data privacy. A common way for utilizing private data under DP is to take an input dataset and synthesize a new dataset that preserves features of the input dataset while satisfying DP. A trade-off always exists between the strength of privacy protection and the utility of the final output: stronger privacy protection requires larger randomness, so the outputs usually have a larger variance and can be far from optimal. In this paper, we summarize our proposed metric for the NIST “A Better Meter Stick for Differential Privacy” competition [26], MarGinal Difference (MGD), for measuring the utility of a synthesized dataset. Our metric is based on earth mover distance. We introduce new features in our metric so that it is not affected by some small random noise that is unavoidable in the DP context but focuses more on the significant difference. We show that our metric can reflect the range query error better compared with other existing metrics. We introduce an efficient computation method based on the min-cost flow to alleviate the high computation cost of the earth mover’s distance.more » « less
-
Large corporations, government entities and institutions such as hospitals and census bureaus routinely collect our personal and sensitive information for providing services. A key technological challenge is designing algorithms for these services that provide useful results, while simultaneously maintaining the privacy of the individuals whose data are being shared. Differential privacy (DP) is a cryptographically motivated and mathematically rigorous approach for addressing this challenge. Under DP, a randomized algorithm provides privacy guarantees by approximating the desired functionality, leading to a privacy–utility trade-off. Strong (pure DP) privacy guarantees are often costly in terms of utility. Motivated by the need for a more efficient mechanism with better privacy–utility trade-off, we propose Gaussian FM, an improvement to the functional mechanism (FM) that offers higher utility at the expense of a weakened (approximate) DP guarantee. We analytically show that the proposed Gaussian FM algorithm can offer orders of magnitude smaller noise compared to the existing FM algorithms. We further extend our Gaussian FM algorithm to decentralized-data settings by incorporating the CAPE protocol and propose capeFM. Our method can offer the same level of utility as its centralized counterparts for a range of parameter choices. We empirically show that our proposed algorithms outperform existing state-of-the-art approaches on synthetic and real datasets.more » « less
-
Differentially Private Stochastic Gradient Descent (DP-SGD) has become a widely used technique for safeguarding sensitive information in deep learning applications. Unfortunately, DP-SGD’s per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We observe that the latest DP-SGD-Global-Adapt’s average gradient norm is the same throughout the training. Even when it is integrated with the existing linear decay noise multiplier, it has little or no advantage. Moreover, we notice that its upper clipping threshold increases exponentially towards the end of training, potentially impacting the model’s convergence. Other algorithms, DP-PSAC, Auto-S, DP-SGD-Global, and DP-F, have utility and fairness that are similar to or worse than DP-SGD, as demonstrated in experiments. To overcome these problems and improve utility and fairness, we developed the DP-SGD-Global-Adapt-V2-S. It has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise. DP-SGD-Global-Adapt-V2-S with a privacy budget of 1 improves accuracy by 0.9795%, 0.6786%, and 4.0130% in MNIST, CIFAR10, and CIFAR100, respectively. It also reduces the privacy cost gap by 89.8332% and 60.5541% in unbalanced MNIST and Thinwall datasets, respectively. Finally, we develop mathematical expressions to compute the privacy budget using truncated concentrated differential privacy (tCDP) for DP-SGD-Global-Adapt-V2-T and DP-SGD-Global-Adapt-V2-S.more » « less