Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Free, publiclyaccessible full text available January 1, 2024

We give the first reconstruction algorithm for decision trees: given queries to a function f that is optclose to a sizes decision tree, our algorithm provides query access to a decision tree T where:  T has size S := s^O((log s)²/ε³);  dist(f,T) ≤ O(opt)+ε;  Every query to T is answered with poly((log s)/ε)⋅ log n queries to f and in poly((log s)/ε)⋅ n log n time. This yields a tolerant tester that distinguishes functions that are close to sizes decision trees from those that are far from sizeS decision trees. The polylogarithmic dependence on s in the efficiency of our tester is exponentially smaller than that of existing testers. Since decision tree complexity is well known to be related to numerous other boolean function properties, our results also provide a new algorithm for reconstructing and testing these properties.

We study the problem of certification: given queries to a function f : {0,1}n → {0,1} with certificate complexity ≤ k and an input x⋆, output a sizek certificate for f’s value on x⋆. For monotone functions, a classic local search algorithm of Angluin accomplishes this task with n queries, which we show is optimal for local search algorithms. Our main result is a new algorithm for certifying monotone functions with O(k8 logn) queries, which comes close to matching the informationtheoretic lower bound of Ω(k logn). The design and analysis of our algorithm are based on a new connection to threshold phenomena in monotone functions. We further prove exponentialink lower bounds when f is nonmonotone, and when f is monotone but the algorithm is only given random examples of f. These lower bounds show that assumptions on the structure of f and query access to it are both necessary for the polynomial dependence on k that we achieve.

Using the framework of boosting, we prove that all impuritybased decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise, and we provide nearmatching upper and lower bounds on the allowable noise rate. We further show that these algorithms, which are simple and have long been central to everyday machine learning, enjoy provable guarantees in the noisy setting that are unmatched by existing algorithms in the theoretical literature on decision tree learning. Taken together, our results add to an ongoing line of research that seeks to place the empirical success of these practical decision tree algorithms on firm theoretical footing.

We initiate the study of a fundamental question concerning adversarial noise models in statistical problems where the algorithm receives i.i.d. draws from a distribution D. The definitions of these adversaries specify the {\sl type} of allowable corruptions (noise model) as well as {\sl when} these corruptions can be made (adaptivity); the latter differentiates between oblivious adversaries that can only corrupt the distribution D and adaptive adversaries that can have their corruptions depend on the specific sample S that is drawn from D. We investigate whether oblivious adversaries are effectively equivalent to adaptive adversaries, across all noise models studied in the literature, under a unifying framework that we introduce. Specifically, can the behavior of an algorithm A in the presence of oblivious adversaries always be wellapproximated by that of an algorithm A′ in the presence of adaptive adversaries? Our first result shows that this is indeed the case for the broad class of {\sl statistical query} algorithms, under all reasonable noise models. We then show that in the specific case of {\sl additive noise}, this equivalence holds for {\sl all} algorithms. Finally, we map out an approach towards proving this statement in its fullest generality, for all algorithms and under allmore »

We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model f:Xd→{0,1} and instance x⋆, our algorithm makes S(f)O(Δf(x⋆))⋅logd {queries} to f and returns an {\sl optimal} counterfactual for x⋆: a nearest instance x′ to x⋆ for which f(x′)≠f(x⋆). Here S(f) is the sensitivity of f, a discrete analogue of the Lipschitz constant, and Δf(x⋆) is the distance from x⋆ to its nearest counterfactuals. The previous best known query complexity was dO(Δf(x⋆)), achievable by bruteforce local search. We further prove a lower bound of S(f)Ω(Δf(x⋆))+Ω(logd) on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.

We consider the problem of explaining the predictions of an arbitrary blackbox model f: given query access to f and an instance x, output a small set of x's features that in conjunction essentially determines f(x). We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of~f necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of “explaining by implicit learning” shares elements of two previously disparate methods for posthoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.

We give an nO(loglogn)time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over {±1}n. Even in the realizable setting, the previous fastest runtime was nO(logn), a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of O'Donnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be “pruned” so that every variable in the resulting tree is influential.

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are kjuntas, they showed that these heuristics successfully learn f with depthk decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depthk decision trees. We provide a counterexample to this conjecture: we construct targets that are depthk decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to kjuntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy.