Kearns, Neel, Roth, and Wu [ICML 2018] recently proposed a notion
of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks
a statistical fairness constraint (say, equalizing false positive rates
across protected groups), but then asks that this constraint hold
over an exponentially or infinitely large collection of subgroups defined by a class of functions with bounded VC dimension. They give
an algorithm guaranteed to learn subject to this constraint, under
the condition that it has access to oracles for perfectly learning absent a fairness constraint. In this paper, we undertake an extensive
empirical evaluation of the algorithm of Kearns et al. On four real
datasets for which fairness is a concern, we investigate the basic
convergence of the algorithm when instantiated with fast heuristics
in place of learning oracles, measure the tradeoffs between fairness
and accuracy, and compare this approach with the recent algorithm
of Agarwal, Beygelzeimer, Dudik, Langford, and Wallach [ICML
2018], which implements weaker and more traditional marginal
fairness constraints defined by individual protected attributes. We
find that in general, the Kearns et al. algorithm converges quickly,
large gains in fairness can be obtained with mild costs to accuracy,
and that optimizing accuracy subject only to marginal fairness
leads to classifiers with substantial subgroup unfairness. We also
provide a number of analyses and visualizations of the dynamics
and behavior of the Kearns et al. algorithm. Overall we find this
algorithm to be effective on real data, and rich subgroup fairness to
be a viable notion in practice
more »
« less
This content will become publicly available on January 1, 2025
Distribution-Specific Auditing for Subgroup Fairness
We study the problem of auditing classifiers for statistical subgroup fairness. Kearns et al. [Kearns et al., 2018] showed that the problem of auditing combinatorial subgroups fairness is as hard as agnostic learning. Essentially all work on remedying statistical measures of discrimination against subgroups assumes access to an oracle for this problem, despite the fact that no efficient algorithms are known for it. If we assume the data distribution is Gaussian, or even merely log-concave, then a recent line of work has discovered efficient agnostic learning algorithms for halfspaces. Unfortunately, the reduction of Kearns et al. was formulated in terms of weak, "distribution-free" learning, and thus did not establish a connection for families such as log-concave distributions. In this work, we give positive and negative results on auditing for Gaussian distributions: On the positive side, we present an alternative approach to leverage these advances in agnostic learning and thereby obtain the first polynomial-time approximation scheme (PTAS) for auditing nontrivial combinatorial subgroup fairness: we show how to audit statistical notions of fairness over homogeneous halfspace subgroups when the features are Gaussian. On the negative side, we find that under cryptographic assumptions, no polynomial-time algorithm can guarantee any nontrivial auditing, even under Gaussian feature distributions, for general halfspace subgroups.
more »
« less
- PAR ID:
- 10544588
- Editor(s):
- Rothblum, Guy N
- Publisher / Repository:
- Schloss Dagstuhl – Leibniz-Zentrum für Informatik
- Date Published:
- Volume:
- 295
- ISSN:
- 1868-8969
- ISBN:
- 978-3-95977-319-5
- Page Range / eLocation ID:
- 295-295
- Subject(s) / Keyword(s):
- Fairness auditing agnostic learning intractability Theory of computation → Machine learning theory
- Format(s):
- Medium: X Size: 20 pages; 862052 bytes Other: application/pdf
- Size(s):
- 20 pages 862052 bytes
- Right(s):
- Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e.g., ReLU, sigmoid, sign). For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statistical-query algorithm with tolerance n−(1/ϵ)b must use at least 2ncϵ queries for some constant b,c>0, where n is the dimension and ϵ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for "amplifying" recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.more » « less
-
We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. On each round, k instances arrive and receive classification outcomes according to a randomized policy deployed by the learner, whose goal is to maximize accuracy while deploying individually fair policies. We first extend the framework of Bechavod et al. (2020), which relies on the existence of a human fairness auditor for detecting fairness violations, to instead incorporate feedback from dynamically-selected panels of multiple, possibly inconsistent, auditors. We then construct an efficient reduction from our problem of online learning with one-sided feedback and a panel reporting fairness violations to the contextual combinatorial semi-bandit problem (Cesa-Bianchi & Lugosi, 2009, György et al., 2007). Finally, we show how to leverage the guarantees of two algorithms in the contextual combinatorial semi-bandit setting: Exp2 (Bubeck et al., 2012) and the oracle-efficient Context-Semi-Bandit-FTPL (Syrgkanis et al., 2016), to provide multi-criteria no regret guarantees simultaneously for accuracy and fairness. Our results eliminate two potential sources of bias from prior work: the "hidden outcomes" that are not available to an algorithm operating in the full information setting, and human biases that might be present in any single human auditor, but can be mitigated by selecting a well chosen panel.more » « less
-
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test example, but the attacker only needs to find one successful perturbation. Xiang et al. [2022] proposed an algorithm for patch attacks that reduces the effective number of perturbations from an exponential to a polynomial, and learns using an ERM oracle. However, their guarantee requires the natural examples to be robustly realizable. In this work we consider the non-robustly-realizable case. Our first contribution is to give a guarantee for this setting by utilizing an approach of Feige, Mansour, and Schapire [2015]. Next, we extend our results to a multi-group setting and introduce a novel agnostic multi-robust learning problem where the goal is to learn a predictor that achieves low robust loss on a (potentially) rich collection of subgroups.more » « less
-
Belkin, M. ; Kpotufe, S. (Ed.)Langevin algorithms are gradient descent methods with additive noise. They have been used for decades in Markov Chain Monte Carlo (MCMC) sampling, optimization, and learning. Their convergence properties for unconstrained non-convex optimization and learning problems have been studied widely in the last few years. Other work has examined projected Langevin algorithms for sampling from log-concave distributions restricted to convex compact sets. For learning and optimization, log-concave distributions correspond to convex losses. In this paper, we analyze the case of non-convex losses with compact convex constraint sets and IID external data variables. We term the resulting method the projected stochastic gradient Langevin algorithm (PSGLA). We show the algorithm achieves a deviation of 𝑂(𝑇−1/4(𝑙𝑜𝑔𝑇)1/2) from its target distribution in 1-Wasserstein distance. For optimization and learning, we show that the algorithm achieves 𝜖-suboptimal solutions, on average, provided that it is run for a time that is polynomial in 𝜖 and slightly super-exponential in the problem dimension.more » « less