We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model f:Xd→{0,1} and instance x⋆, our algorithm makes
S(f)O(Δf(x⋆))⋅logd
{queries} to f and returns an {\sl optimal} counterfactual for x⋆: a nearest instance x′ to x⋆ for which f(x′)≠f(x⋆). Here S(f) is the sensitivity of f, a discrete analogue of the Lipschitz constant, and Δf(x⋆) is the distance from x⋆ to its nearest counterfactuals. The previous best known query complexity was dO(Δf(x⋆)), achievable by brute-force local search. We further prove a lower bound of S(f)Ω(Δf(x⋆))+Ω(logd) on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.
more »
« less
Provably efficient, succinct, and precise explanations
We consider the problem of explaining the predictions of an arbitrary blackbox model f: given query access to f and an instance x, output a small set of x's features that in conjunction essentially determines f(x). We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of~f
necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of “explaining by implicit learning” shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.
more »
« less
- Award ID(s):
- 2006664
- NSF-PAR ID:
- 10339720
- Date Published:
- Journal Name:
- Conference on Neural Information Processing Systems (NeurIPS 2021)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given 2O~(n√/ε) uniformly random examples of an unknown function f:{±1}n→{±1}, our algorithm outputs a hypothesis g:{±1}n→{±1} that is monotone and (opt +ε)-close to f, where opt is the distance from f to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also 2O~(n√/ε), nearly matching the lower bound of [13]. We also give an algorithm for estimating up to additive error ε the distance of an unknown function f to monotone using a run-time of 2O~(n√/ε). Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity.This work builds upon the improper learning algorithm of [17] and the proper semiagnostic learning algorithm of [40], which obtains a non-monotone Boolean-valued hypothesis, then “corrects” it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than 2 opt +ε information-theoretically; we bypass this barrier bya)augmenting the improper learner with a convex optimization step, andb)learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the “poset sorting” problem of [40] for functions over general posets with non-Boolean labels.more » « less
-
We study the problem of certification: given queries to a function f : {0,1}n → {0,1} with certificate complexity ≤ k and an input x⋆, output a size-k certificate for f’s value on x⋆. For monotone functions, a classic local search algorithm of Angluin accomplishes this task with n queries, which we show is optimal for local search algorithms. Our main result is a new algorithm for certifying monotone functions with O(k8 logn) queries, which comes close to matching the information-theoretic lower bound of Ω(k logn). The design and analysis of our algorithm are based on a new connection to threshold phenomena in monotone functions. We further prove exponential-in-k lower bounds when f is non-monotone, and when f is monotone but the algorithm is only given random examples of f. These lower bounds show that assumptions on the structure of f and query access to it are both necessary for the polynomial dependence on k that we achieve.more » « less
-
In the certification problem, the algorithm is given a function f with certificate complexity k and an input x^⋆, and the goal is to find a certificate of size ≤ poly(k) for f’s value at x^⋆. This problem is in NP^NP, and assuming 𝖯 ≠ NP, is not in 𝖯. Prior works, dating back to Valiant in 1984, have therefore sought to design efficient algorithms by imposing assumptions on f such as monotonicity. Our first result is a BPP^NP algorithm for the general problem. The key ingredient is a new notion of the balanced influence of variables, a natural variant of influence that corrects for the bias of the function. Balanced influences can be accurately estimated via uniform generation, and classic BPP^NP algorithms are known for the latter task. We then consider certification with stricter instance-wise guarantees: for each x^⋆, find a certificate whose size scales with that of the smallest certificate for x^⋆. In sharp contrast with our first result, we show that this problem is NP^NP-hard even to approximate. We obtain an optimal inapproximability ratio, adding to a small handful of problems in the higher levels of the polynomial hierarchy for which optimal inapproximability is known. Our proof involves the novel use of bit-fixing dispersers for gap amplification.more » « less
-
Lysyanskaya, Anna ; Handschuh, Helena (Ed.)We study the black-box function inversion problem, which is the problem of finding x[N] such that f(x)=y, given as input some challenge point y in the image of a function f:[N][N], using T oracle queries to f and preprocessed advice 01S depending on f. We prove a number of new results about this problem, as follows. 1. We show an algorithm that works for any T and S satisfying TS2maxST=(N3) . In the important setting when ST, this improves on the celebrated algorithm of Fiat and Naor [STOC, 1991], which requires TS3N3. E.g., Fiat and Naor's algorithm is only non-trivial for SN23 , while our algorithm gives a non-trivial tradeoff for any SN12 . (Our algorithm and analysis are quite simple. As a consequence of this, we also give a self-contained and simple proof of Fiat and Naor's original result, with certain optimizations left out for simplicity.) 2. We show a non-adaptive algorithm (i.e., an algorithm whose ith query xi is chosen based entirely on and y, and not on the f(x1)f(xi−1)) that works for any T and S satisfying S=(Nlog(NT)) giving the first non-trivial non-adaptive algorithm for this problem. E.g., setting T=Npolylog(N) gives S=(NloglogN). This answers a question due to Corrigan-Gibbs and Kogan [TCC, 2019], who asked whether it was possible for a non-adaptive algorithm to work with parameters T and S satisfying T+SlogNo(N) . We also observe that our non-adaptive algorithm is what we call a guess-and-check algorithm, that is, it is non-adaptive and its final output is always one of the oracle queries x1xT. For guess-and-check algorithms, we prove a matching lower bound, therefore completely characterizing the achievable parameters (ST) for this natural class of algorithms. (Corrigan-Gibbs and Kogan showed that any such lower bound for arbitrary non-adaptive algorithms would imply new circuit lower bounds.) 3. We show equivalence between function inversion and a natural decision version of the problem in both the worst case and the average case, and similarly for functions f:[N][M] with different ranges. All of the above results are most naturally described in a model with shared randomness (i.e., random coins shared between the preprocessing algorithm and the online algorithm). However, as an additional contribution, we show (using a technique from communication complexity due to Newman [IPL, 1991]) how to generically convert any algorithm that uses shared randomness into one that does not.more » « less