We study the problem of classifier derandomization in machine learning: given a stochastic binary classifier f:X→[0,1], sample a deterministic classifier f̂ :X→{0,1} that approximates the output of f in aggregate over any data distribution. Recent work revealed how to efficiently derandomize a stochastic classifier with strong output approximation guarantees, but at the cost of individual fairness -- that is, if f treated similar inputs similarly, f̂ did not. In this paper, we initiate a systematic study of classifier derandomization with metric fairness guarantees. We show that the prior derandomization approach is almost maximally metric-unfair, and that a simple ``random threshold'' derandomization achieves optimal fairness preservation but with weaker output approximation. We then devise a derandomization procedure that provides an appealing tradeoff between these two: if f is α-metric fair according to a metric d with a locality-sensitive hash (LSH) family, then our derandomized f̂ is, with high probability, O(α)-metric fair and a close approximation of f. We also prove generic results applicable to all (fair and unfair) classifier derandomization procedures, including a bias-variance decomposition and reductions between various notions of metric fairness.
more »
« less
Derandomization vs Refutation: A Unified Framework for Characterizing Derandomization
- Award ID(s):
- 2127597
- PAR ID:
- 10579538
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-1894-4
- Page Range / eLocation ID:
- 1008 to 1047
- Format(s):
- Medium: X
- Location:
- Santa Cruz, CA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present a novel approach to automatically recover information about the address space layout of remote processes in the presence of Address Space Layout Randomization (ASLR). Our system, dubbed Sleak, performs static analysis and symbolic execution of binary executable programs, and identifies program paths and input parameters leading to partial (i.e., only a few bits) or complete (i.e., the whole address) information disclosure vulnerabilities, revealing addresses of known objects of the target service or application. Sleak takes, as input, the binary executable program, and generates a symbolic expression for each program output that leaks information about the addresses of objects, such as stack variables, heap structures, or function pointers. By comparing these expressions with the concrete output of a remote process executing the same binary program image, our system is able to recover from a few bits to whole addresses of objects of the target application or service. Discovering the address of a single object in the target application is often enough to guess the layout of entire sections of the address space, which can be leveraged by attackers to bypass ASLR.more » « less