We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model f:Xd→{0,1} and instance x⋆, our algorithm makes
S(f)O(Δf(x⋆))⋅logd
{queries} to f and returns an {\sl optimal} counterfactual for x⋆: a nearest instance x′ to x⋆ for which f(x′)≠f(x⋆). Here S(f) is the sensitivity of f, a discrete analogue of the Lipschitz constant, and Δf(x⋆) is the distance from x⋆ to its nearest counterfactuals. The previous best known query complexity was dO(Δf(x⋆)), achievable by bruteforce local search. We further prove a lower bound of S(f)Ω(Δf(x⋆))+Ω(logd) on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.
more »
« less
Provably efficient, succinct, and precise explanations
We consider the problem of explaining the predictions of an arbitrary blackbox model f: given query access to f and an instance x, output a small set of x's features that in conjunction essentially determines f(x). We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of~f
necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of “explaining by implicit learning” shares elements of two previously disparate methods for posthoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.
more »
« less
 Award ID(s):
 2006664
 NSFPAR ID:
 10339720
 Date Published:
 Journal Name:
 Conference on Neural Information Processing Systems (NeurIPS 2021)
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given 2O~(n√/ε) uniformly random examples of an unknown function f:{±1}n→{±1}, our algorithm outputs a hypothesis g:{±1}n→{±1} that is monotone and (opt +ε)close to f, where opt is the distance from f to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also 2O~(n√/ε), nearly matching the lower bound of [13]. We also give an algorithm for estimating up to additive error ε the distance of an unknown function f to monotone using a runtime of 2O~(n√/ε). Previously, for both of these problems, sampleefficient algorithms were known, but these algorithms were not runtime efficient. Our work thus closes this gap in our knowledge between the runtime and sample complexity.This work builds upon the improper learning algorithm of [17] and the proper semiagnostic learning algorithm of [40], which obtains a nonmonotone Booleanvalued hypothesis, then “corrects” it to monotone using queryefficient local computation algorithms on graphs. This blackbox correction approach can achieve no error better than 2 opt +ε informationtheoretically; we bypass this barrier bya)augmenting the improper learner with a convex optimization step, andb)learning and correcting a realvalued function before rounding its values to Boolean. Our realvalued correction algorithm solves the “poset sorting” problem of [40] for functions over general posets with nonBoolean labels.more » « less

We study the problem of certification: given queries to a function f : {0,1}n → {0,1} with certificate complexity ≤ k and an input x⋆, output a sizek certificate for f’s value on x⋆. For monotone functions, a classic local search algorithm of Angluin accomplishes this task with n queries, which we show is optimal for local search algorithms. Our main result is a new algorithm for certifying monotone functions with O(k8 logn) queries, which comes close to matching the informationtheoretic lower bound of Ω(k logn). The design and analysis of our algorithm are based on a new connection to threshold phenomena in monotone functions. We further prove exponentialink lower bounds when f is nonmonotone, and when f is monotone but the algorithm is only given random examples of f. These lower bounds show that assumptions on the structure of f and query access to it are both necessary for the polynomial dependence on k that we achieve.more » « less

In the certification problem, the algorithm is given a function f with certificate complexity k and an input x^⋆, and the goal is to find a certificate of size ≤ poly(k) for f’s value at x^⋆. This problem is in NP^NP, and assuming 𝖯 ≠ NP, is not in 𝖯. Prior works, dating back to Valiant in 1984, have therefore sought to design efficient algorithms by imposing assumptions on f such as monotonicity. Our first result is a BPP^NP algorithm for the general problem. The key ingredient is a new notion of the balanced influence of variables, a natural variant of influence that corrects for the bias of the function. Balanced influences can be accurately estimated via uniform generation, and classic BPP^NP algorithms are known for the latter task. We then consider certification with stricter instancewise guarantees: for each x^⋆, find a certificate whose size scales with that of the smallest certificate for x^⋆. In sharp contrast with our first result, we show that this problem is NP^NPhard even to approximate. We obtain an optimal inapproximability ratio, adding to a small handful of problems in the higher levels of the polynomial hierarchy for which optimal inapproximability is known. Our proof involves the novel use of bitfixing dispersers for gap amplification.more » « less

Lysyanskaya, Anna ; Handschuh, Helena (Ed.)We study the blackbox function inversion problem, which is the problem of finding x[N] such that f(x)=y, given as input some challenge point y in the image of a function f:[N][N], using T oracle queries to f and preprocessed advice 01S depending on f. We prove a number of new results about this problem, as follows. 1. We show an algorithm that works for any T and S satisfying TS2maxST=(N3) . In the important setting when ST, this improves on the celebrated algorithm of Fiat and Naor [STOC, 1991], which requires TS3N3. E.g., Fiat and Naor's algorithm is only nontrivial for SN23 , while our algorithm gives a nontrivial tradeoff for any SN12 . (Our algorithm and analysis are quite simple. As a consequence of this, we also give a selfcontained and simple proof of Fiat and Naor's original result, with certain optimizations left out for simplicity.) 2. We show a nonadaptive algorithm (i.e., an algorithm whose ith query xi is chosen based entirely on and y, and not on the f(x1)f(xi−1)) that works for any T and S satisfying S=(Nlog(NT)) giving the first nontrivial nonadaptive algorithm for this problem. E.g., setting T=Npolylog(N) gives S=(NloglogN). This answers a question due to CorriganGibbs and Kogan [TCC, 2019], who asked whether it was possible for a nonadaptive algorithm to work with parameters T and S satisfying T+SlogNo(N) . We also observe that our nonadaptive algorithm is what we call a guessandcheck algorithm, that is, it is nonadaptive and its final output is always one of the oracle queries x1xT. For guessandcheck algorithms, we prove a matching lower bound, therefore completely characterizing the achievable parameters (ST) for this natural class of algorithms. (CorriganGibbs and Kogan showed that any such lower bound for arbitrary nonadaptive algorithms would imply new circuit lower bounds.) 3. We show equivalence between function inversion and a natural decision version of the problem in both the worst case and the average case, and similarly for functions f:[N][M] with different ranges. All of the above results are most naturally described in a model with shared randomness (i.e., random coins shared between the preprocessing algorithm and the online algorithm). However, as an additional contribution, we show (using a technique from communication complexity due to Newman [IPL, 1991]) how to generically convert any algorithm that uses shared randomness into one that does not.more » « less