skip to main content


Title: Properly Learning Decision Trees in almost Polynomial Time
We give an nO(log log n)-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over { ± 1}n. Even in the realizable setting, the previous fastest runtime was nO(log n), a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of O’Donnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be “pruned” so that every variable in the resulting tree is influential.  more » « less
Award ID(s):
2006664
NSF-PAR ID:
10434313
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of the Association for Computing Machinery
Volume:
69
Issue:
6
ISSN:
0004-5411
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We give an nO(loglogn)-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over {±1}n. Even in the realizable setting, the previous fastest runtime was nO(logn), a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of O'Donnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be “pruned” so that every variable in the resulting tree is influential. 
    more » « less
  2. We give the first reconstruction algorithm for decision trees: given queries to a function f that is opt-close to a size-s decision tree, our algorithm provides query access to a decision tree T where: - T has size S := s^O((log s)²/ε³); - dist(f,T) ≤ O(opt)+ε; - Every query to T is answered with poly((log s)/ε)⋅ log n queries to f and in poly((log s)/ε)⋅ n log n time. This yields a tolerant tester that distinguishes functions that are close to size-s decision trees from those that are far from size-S decision trees. The polylogarithmic dependence on s in the efficiency of our tester is exponentially smaller than that of existing testers. Since decision tree complexity is well known to be related to numerous other boolean function properties, our results also provide a new algorithm for reconstructing and testing these properties. 
    more » « less
  3. null (Ed.)
    Directed Steiner Tree (DST) is a central problem in combinatorial optimization and theoretical computer science: Given a directed graph G = (V, E) with edge costs c ∈ ℝ_{≥ 0}^E, a root r ∈ V and k terminals K ⊆ V, we need to output a minimum-cost arborescence in G that contains an rrightarrow t path for every t ∈ K. Recently, Grandoni, Laekhanukit and Li, and independently Ghuge and Nagarajan, gave quasi-polynomial time O(log²k/log log k)-approximation algorithms for the problem, which are tight under popular complexity assumptions. In this paper, we consider the more general Degree-Bounded Directed Steiner Tree (DB-DST) problem, where we are additionally given a degree bound d_v on each vertex v ∈ V, and we require that every vertex v in the output tree has at most d_v children. We give a quasi-polynomial time (O(log n log k), O(log² n))-bicriteria approximation: The algorithm produces a solution with cost at most O(log nlog k) times the cost of the optimum solution that violates the degree constraints by at most a factor of O(log²n). This is the first non-trivial result for the problem. While our cost-guarantee is nearly optimal, the degree violation factor of O(log²n) is an O(log n)-factor away from the approximation lower bound of Ω(log n) from the Set Cover hardness. The hardness result holds even on the special case of the Degree-Bounded Group Steiner Tree problem on trees (DB-GST-T). With the hope of closing the gap, we study the question of whether the degree violation factor can be made tight for this special case. We answer the question in the affirmative by giving an (O(log nlog k), O(log n))-bicriteria approximation algorithm for DB-GST-T. 
    more » « less
  4. Chakrabarti, Amit ; Swamy, Chaitanya (Ed.)
    A Boolean maximum constraint satisfaction problem, Max-CSP(f), is specified by a predicate f:{-1,1}^k → {0,1}. An n-variable instance of Max-CSP(f) consists of a list of constraints, each of which applies f to k distinct literals drawn from the n variables. For k = 2, Chou, Golovnev, and Velusamy [Chou et al., 2020] obtained explicit ratios characterizing the √ n-space streaming approximability of every predicate. For k ≥ 3, Chou, Golovnev, Sudan, and Velusamy [Chou et al., 2022] proved a general dichotomy theorem for √ n-space sketching algorithms: For every f, there exists α(f) ∈ (0,1] such that for every ε > 0, Max-CSP(f) is (α(f)-ε)-approximable by an O(log n)-space linear sketching algorithm, but (α(f)+ε)-approximation sketching algorithms require Ω(√n) space. In this work, we give closed-form expressions for the sketching approximation ratios of multiple families of symmetric Boolean functions. Letting α'_k = 2^{-(k-1)} (1-k^{-2})^{(k-1)/2}, we show that for odd k ≥ 3, α(kAND) = α'_k, and for even k ≥ 2, α(kAND) = 2α'_{k+1}. Thus, for every k, kAND can be (2-o(1))2^{-k}-approximated by O(log n)-space sketching algorithms; we contrast this with a lower bound of Chou, Golovnev, Sudan, Velingker, and Velusamy [Chou et al., 2022] implying that streaming (2+ε)2^{-k}-approximations require Ω(n) space! We also resolve the ratio for the "at-least-(k-1)-1’s" function for all even k; the "exactly-(k+1)/2-1’s" function for odd k ∈ {3,…,51}; and fifteen other functions. We stress here that for general f, the dichotomy theorem in [Chou et al., 2022] only implies that α(f) can be computed to arbitrary precision in PSPACE, and thus closed-form expressions need not have existed a priori. Our analyses involve identifying and exploiting structural "saddle-point" properties of this dichotomy. Separately, for all threshold functions, we give optimal "bias-based" approximation algorithms generalizing [Chou et al., 2020] while simplifying [Chou et al., 2022]. Finally, we investigate the √ n-space streaming lower bounds in [Chou et al., 2022], and show that they are incomplete for 3AND, i.e., they fail to rule out (α(3AND})-ε)-approximations in o(√ n) space. 
    more » « less
  5. We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution ‍D. The efficiency of our transformation scales with the inherent complexity of ‍D, running in (n, (md)d) time for distributions over n whose pmfs are computed by depth-d decision trees, where m is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from ‍D, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to D, produces an optimal decision tree decomposition of D: an approximation of D as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art—results of independent interest in distribution learning. 
    more » « less