We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model f:Xd→{0,1} and instance x⋆, our algorithm makes S(f)O(Δf(x⋆))⋅logd {queries} to f and returns an {\sl optimal} counterfactual for x⋆: a nearest instance x′ to x⋆ for which f(x′)≠f(x⋆). Here S(f) is the sensitivity of f, a discrete analogue of the Lipschitz constant, and Δf(x⋆) is the distance from x⋆ to its nearest counterfactuals. The previous best known query complexity was dO(Δf(x⋆)), achievable by brute-force local search. We further prove a lower bound of S(f)Ω(Δf(x⋆))+Ω(logd) on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.
more »
« less
Provably efficient, succinct, and precise explanations
We consider the problem of explaining the predictions of an arbitrary blackbox model f: given query access to f and an instance x, output a small set of x's features that in conjunction essentially determines f(x). We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of~f necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of “explaining by implicit learning” shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.
more »
« less
- Award ID(s):
- 2006664
- PAR ID:
- 10339720
- Date Published:
- Journal Name:
- Conference on Neural Information Processing Systems (NeurIPS 2021)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given 2O~(n√/ε) uniformly random examples of an unknown function f:{±1}n→{±1}, our algorithm outputs a hypothesis g:{±1}n→{±1} that is monotone and (opt +ε)-close to f, where opt is the distance from f to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also 2O~(n√/ε), nearly matching the lower bound of [13]. We also give an algorithm for estimating up to additive error ε the distance of an unknown function f to monotone using a run-time of 2O~(n√/ε). Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity.This work builds upon the improper learning algorithm of [17] and the proper semiagnostic learning algorithm of [40], which obtains a non-monotone Boolean-valued hypothesis, then “corrects” it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than 2 opt +ε information-theoretically; we bypass this barrier bya)augmenting the improper learner with a convex optimization step, andb)learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the “poset sorting” problem of [40] for functions over general posets with non-Boolean labels.more » « less
-
In the certification problem, the algorithm is given a function f with certificate complexity k and an input x^⋆, and the goal is to find a certificate of size ≤ poly(k) for f’s value at x^⋆. This problem is in NP^NP, and assuming 𝖯 ≠ NP, is not in 𝖯. Prior works, dating back to Valiant in 1984, have therefore sought to design efficient algorithms by imposing assumptions on f such as monotonicity. Our first result is a BPP^NP algorithm for the general problem. The key ingredient is a new notion of the balanced influence of variables, a natural variant of influence that corrects for the bias of the function. Balanced influences can be accurately estimated via uniform generation, and classic BPP^NP algorithms are known for the latter task. We then consider certification with stricter instance-wise guarantees: for each x^⋆, find a certificate whose size scales with that of the smallest certificate for x^⋆. In sharp contrast with our first result, we show that this problem is NP^NP-hard even to approximate. We obtain an optimal inapproximability ratio, adding to a small handful of problems in the higher levels of the polynomial hierarchy for which optimal inapproximability is known. Our proof involves the novel use of bit-fixing dispersers for gap amplification.more » « less
-
We study the problem of certification: given queries to a function f : {0,1}n → {0,1} with certificate complexity ≤ k and an input x⋆, output a size-k certificate for f’s value on x⋆. For monotone functions, a classic local search algorithm of Angluin accomplishes this task with n queries, which we show is optimal for local search algorithms. Our main result is a new algorithm for certifying monotone functions with O(k8 logn) queries, which comes close to matching the information-theoretic lower bound of Ω(k logn). The design and analysis of our algorithm are based on a new connection to threshold phenomena in monotone functions. We further prove exponential-in-k lower bounds when f is non-monotone, and when f is monotone but the algorithm is only given random examples of f. These lower bounds show that assumptions on the structure of f and query access to it are both necessary for the polynomial dependence on k that we achieve.more » « less
-
Tree-based machine learning models such as random forests, decision trees and gradient boosted trees are popular nonlinear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here we improve the interpretability of tree-based models through three main contributions. (1) A polynomial time algorithm to compute optimal explanations based on game theory. (2) A new type of explanation that directly measures local feature interaction effects. (3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to (1) identify high-magnitude but low-frequency nonlinear mortality risk factors in the US population, (2) highlight distinct population subgroups with shared risk characteristics, (3) identify nonlinear interaction effects among risk factors for chronic kidney disease and (4) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.more » « less
An official website of the United States government

