f-DP has recently been proposed as a generalization of differential privacy allowing a lossless analysis of composition, post-processing, and privacy amplification via subsampling. In the setting of f-DP, we propose the concept of a canonical noise distribution (CND), the first mechanism designed for an arbitrary f-DP guarantee. The notion of CND captures whether an additive privacy mechanism perfectly matches the privacy guarantee of a given f. We prove that a CND always exists, and give a construction that produces a CND for any f. We show that private hypothesis tests are intimately related to CNDs, allowing for the release of private p-values at no additional privacy cost as well as the construction of uniformly most powerful (UMP) tests for binary data, within the general f-DP framework. We apply our techniques to the problem of difference of proportions testing, and construct a UMP unbiased (UMPU) "semi-private" test which upper bounds the performance of any f-DP test. Using this as a benchmark we propose a private test, based on the inversion of characteristic functions, which allows for optimal inference for the two population parameters and is nearly as powerful as the semi-private UMPU. When specialized to the case of (ϵ,0)-DP, we show empirically that our proposed test is more powerful than any (ϵ/sqrt(2))-DP test and has more accurate type I errors than the classic normal approximation test.
more »
« less
The Test of Tests: A Framework for Differentially Private Hypothesis Testing
We present a generic framework for creating differentially private versions of any hypothesis test in a black-box way. We analyze the resulting tests analytically and experimentally. Most crucially, we show good practical performance for small data sets, showing that at ϵ = 1 we only need 5-6 times as much data as in the fully public setting. We compare our work to the one existing framework of this type, as well as to several individually-designed private hypothesis tests. Our framework is higher power than other generic solutions and at least competitive with (and often better than) individually-designed tests.
more »
« less
- Award ID(s):
- 1817245
- PAR ID:
- 10483524
- Publisher / Repository:
- Proceedings of Machine Learning Research
- Date Published:
- Journal Name:
- Proceedings of the 40th International Conference on Machine Learning
- Volume:
- 202
- Page Range / eLocation ID:
- 16131-16151
- Format(s):
- Medium: X
- Location:
- Honolulu, Hawaii
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take as our starting point traditional nonparametric tests, which require no distributional assumption (e.g., normality) about the data distribution. We present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon signed-rank tests, as well as the parametric one-sample t-test. These tests use novel test statistics developed specifically for the private setting. We compare our tests to prior work, both on parametric and nonparametric tests. We find that in all cases our new nonparametric tests achieve large improvements in statistical power, even when the assumptions of parametric tests are met.more » « less
-
A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Past research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case of learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019]. We explore prediction as an alternative to learning. A predictor answers a stream of classification queries instead of outputting a hypothesis. Earlier work has considered a private prediction model with a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and in a manner that cannot rely solely on the training set. We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions over infinite domains, for which (traditional) private learning is known to be impossible.more » « less
-
Data that is gathered adaptively --- via bandit algorithms, for example --- exhibits bias. This is true both when gathering simple numeric valued data --- the empirical means kept track of by stochastic bandit algorithms are biased downwards --- and when gathering more complicated data --- running hypothesis tests on complex data gathered via contextual bandit algorithms leads to false discovery. In this paper, we show that this problem is mitigated if the data collection procedure is differentially private. This lets us both bound the bias of simple numeric valued quantities (like the empirical means of stochastic bandit algorithms), and correct the p-values of hypothesis tests run on the adaptively gathered data. Moreover, there exist differentially private bandit algorithms with near optimal regret bounds: we apply existing theorems in the simple stochastic case, and give a new analysis for linear contextual bandits. We complement our theoretical results with experiments validating our theory.more » « less
-
Private set intersection (PSI) enables two parties to jointly compute the intersection of their private sets without revealing any extra information to each other. In this work, we focus on the unbalanced setting where one party (a powerful server) holds a significantly larger set than the other party (a resource-limited client). We present a new protocol for this setting that achieves a better balance between low client-side storage and efficient online processing. We first formalize a general framework to transform Private Information Retrieval (PIR) into PSI with techniques used in prior works. Building upon recent advancements in Private Information Retrieval (PIR), specifically the SimplePIR construction (Henzinger et al., USENIX Security'23), combined with our tailored techniques, our construction shows a great improvement in online efficiency. Concretely, when the client holds a single element, our protocol achieves more than 100x faster computation and over 4x lower communication compared to the state-of-the-art unbalanced PSI based on leveled fully homomorphic encryption (Chen et al., CCS'21). The client-side storage is only in the order of tens of megabytes, even for a gigabyte-sized set on the server. Moreover, since the framework is generic, any future improvement in PIR can further improve our construction.more » « less
An official website of the United States government
