Motivated by settings in which predictive models may be required to be non-discriminatory with respect to certain attributes (such as race), but even collecting the sensitive attribute may be forbidden or restricted, we initiate the study of fair learning under the constraint of differential privacy. Our first algorithm is a private implementation of the equalized odds post-processing approach of (Hardt et al., 2016). This algorithm is appealingly simple, but must be able to use protected group membership explicitly at test time, which can be viewed as a form of “disparate treatment”. Our second algorithm is a differentially private version of the oracle-efficient in-processing approach of (Agarwal et al., 2018) which is more complex but need not have access to protected group membership at test time. We identify new tradeoffs between fairness, accuracy, and privacy that emerge only when requiring all three properties, and show that these tradeoffs can be milder if group membership may be used at test time. We conclude with a brief experimental evaluation. more »« less
Bun, Mark; Livni, Roi; Moran, Shay(
, 61st Annual IEEE Symposium on Foundations of Computer Science)
null
(Ed.)
We prove that every concept class with finite Littlestone dimension can be learned by an (approximate) differentially-private algorithm. This answers an open question of Alon et al. (STOC
2019) who proved the converse statement (this question was also asked by Neel et al. (FOCS
2019)). Together these two results yield an equivalence between online learnability and private
PAC learnability.
We introduce a new notion of algorithmic stability called “global stability” which is essential
to our proof and may be of independent interest. We also discuss an application of our results
to boosting the privacy and accuracy parameters of differentially-private learners.
Kearns, Neel, Roth, and Wu [ICML 2018] recently proposed a notion
of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks
a statistical fairness constraint (say, equalizing false positive rates
across protected groups), but then asks that this constraint hold
over an exponentially or infinitely large collection of subgroups defined by a class of functions with bounded VC dimension. They give
an algorithm guaranteed to learn subject to this constraint, under
the condition that it has access to oracles for perfectly learning absent a fairness constraint. In this paper, we undertake an extensive
empirical evaluation of the algorithm of Kearns et al. On four real
datasets for which fairness is a concern, we investigate the basic
convergence of the algorithm when instantiated with fast heuristics
in place of learning oracles, measure the tradeoffs between fairness
and accuracy, and compare this approach with the recent algorithm
of Agarwal, Beygelzeimer, Dudik, Langford, and Wallach [ICML
2018], which implements weaker and more traditional marginal
fairness constraints defined by individual protected attributes. We
find that in general, the Kearns et al. algorithm converges quickly,
large gains in fairness can be obtained with mild costs to accuracy,
and that optimizing accuracy subject only to marginal fairness
leads to classifiers with substantial subgroup unfairness. We also
provide a number of analyses and visualizations of the dynamics
and behavior of the Kearns et al. algorithm. Overall we find this
algorithm to be effective on real data, and rich subgroup fairness to
be a viable notion in practice
Naor, Moni Naor; Nissim, Kobbi Nissim; Stemmer, Uri; Yan, Chao(
, proceedings of 37th NeurIPS 2023)
A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Past research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case of learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019].
We explore prediction as an alternative to learning. A predictor answers a stream of classification queries instead of outputting a hypothesis. Earlier work has considered a private prediction model with a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and in a manner that cannot rely solely on the training set.
We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions over infinite domains, for which (traditional) private learning is known to be impossible.
Narayanan, S.(
, Conference on Learning Theory (COLT 2022))
We provide improved differentially private algorithms for identity testing of high-dimensional distributions. Specifically, for d-dimensional Gaussian distributions with known covariance Σ, we can test whether the distribution comes from N(μ∗,Σ) for some fixed μ∗ or from some N(μ,Σ) with total variation distance at least α from N(μ∗,Σ) with (ε,0)-differential privacy, using only
O~(d1/2α2+d1/3α4/3⋅ε2/3+1α⋅ε)
samples if the algorithm is allowed to be computationally inefficient, and only
O~(d1/2α2+d1/4α⋅ε)
samples for a computationally efficient algorithm. We also provide a matching lower bound showing that our computationally inefficient algorithm has optimal sample complexity. We also extend our algorithms to various related problems, including mean testing of Gaussians with bounded but unknown covariance, uniformity testing of product distributions over {−1,1}d, and tolerant testing.
Our results improve over the previous best work of Canonne et al.~\cite{CanonneKMUZ20} for both computationally efficient and inefficient algorithms, and even our computationally efficient algorithm matches the optimal \emph{non-private} sample complexity of O(d√α2) in many standard parameter settings.
In addition, our results show that, surprisingly, private identity testing of d-dimensional Gaussians can be done with fewer samples than private identity testing of discrete distributions over a domain of size d \cite{AcharyaSZ18}, which refutes a conjectured lower bound of~\cite{CanonneKMUZ20}.
Kaplan, Haim; Mansour, Yishay; Moran, Shay; Nissim, Kobbi; Stemmer, Uri(
, 37th Conference on Neural Information Processing Systems (NeurIPS 2023))
In this work we revisit an interactive variant of joint differential privacy, recently
introduced by Naor et al. [2023], and generalize it towards handling online processes
in which existing privacy definitions seem too restrictive. We study basic
properties of this definition and demonstrate that it satisfies (suitable variants) of
group privacy, composition, and post processing.
In order to demonstrate the advantages of this privacy definition compared to
traditional forms of differential privacy, we consider the basic setting of online
classification. We show that any (possibly non-private) learning rule can be effectively
transformed to a private learning rule with only a polynomial overhead in
the mistake bound. This demonstrates a stark difference with traditional forms
of differential privacy, such as the one studied by Golowich and Livni [2021],
where only a double exponential overhead in the mistake bound is known (via an
information theoretic upper bound).
Jagielski, M, Kearns, M, Mao, J, Oprea, A, Roth, A, Sharifi -Malvajerdi, A, and Ullman, J. Differentially Private Fair Learning. Retrieved from https://par.nsf.gov/biblio/10100415. International Conference on Machine Learining .
Jagielski, M, Kearns, M, Mao, J, Oprea, A, Roth, A, Sharifi -Malvajerdi, A, & Ullman, J. Differentially Private Fair Learning. International Conference on Machine Learining, (). Retrieved from https://par.nsf.gov/biblio/10100415.
Jagielski, M, Kearns, M, Mao, J, Oprea, A, Roth, A, Sharifi -Malvajerdi, A, and Ullman, J.
"Differentially Private Fair Learning". International Conference on Machine Learining (). Country unknown/Code not available. https://par.nsf.gov/biblio/10100415.
@article{osti_10100415,
place = {Country unknown/Code not available},
title = {Differentially Private Fair Learning},
url = {https://par.nsf.gov/biblio/10100415},
abstractNote = {Motivated by settings in which predictive models may be required to be non-discriminatory with respect to certain attributes (such as race), but even collecting the sensitive attribute may be forbidden or restricted, we initiate the study of fair learning under the constraint of differential privacy. Our first algorithm is a private implementation of the equalized odds post-processing approach of (Hardt et al., 2016). This algorithm is appealingly simple, but must be able to use protected group membership explicitly at test time, which can be viewed as a form of “disparate treatment”. Our second algorithm is a differentially private version of the oracle-efficient in-processing approach of (Agarwal et al., 2018) which is more complex but need not have access to protected group membership at test time. We identify new tradeoffs between fairness, accuracy, and privacy that emerge only when requiring all three properties, and show that these tradeoffs can be milder if group membership may be used at test time. We conclude with a brief experimental evaluation.},
journal = {International Conference on Machine Learining},
author = {Jagielski, M and Kearns, M and Mao, J and Oprea, A and Roth, A and Sharifi -Malvajerdi, A and Ullman, J},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.