skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Properly Learning Decision Trees in almost Polynomial Time
We give an nO(log log n)-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over { ± 1}n. Even in the realizable setting, the previous fastest runtime was nO(log n), a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of O’Donnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be “pruned” so that every variable in the resulting tree is influential.  more » « less
Award ID(s):
2006664
PAR ID:
10434313
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of the Association for Computing Machinery
Volume:
69
Issue:
6
ISSN:
0004-5411
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We give an nO(loglogn)-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over {±1}n. Even in the realizable setting, the previous fastest runtime was nO(logn), a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision trees, which we augment with additional ideas to circumvent known lower bounds against these heuristics. To analyze our algorithm, we prove a new structural result for decision trees that strengthens a theorem of O'Donnell, Saks, Schramm, and Servedio. While the OSSS theorem says that every decision tree has an influential variable, we show how every decision tree can be “pruned” so that every variable in the resulting tree is influential. 
    more » « less
  2. We give the first reconstruction algorithm for decision trees: given queries to a function f that is opt-close to a size-s decision tree, our algorithm provides query access to a decision tree T where: - T has size S := s^O((log s)²/ε³); - dist(f,T) ≤ O(opt)+ε; - Every query to T is answered with poly((log s)/ε)⋅ log n queries to f and in poly((log s)/ε)⋅ n log n time. This yields a tolerant tester that distinguishes functions that are close to size-s decision trees from those that are far from size-S decision trees. The polylogarithmic dependence on s in the efficiency of our tester is exponentially smaller than that of existing testers. Since decision tree complexity is well known to be related to numerous other boolean function properties, our results also provide a new algorithm for reconstructing and testing these properties. 
    more » « less
  3. Mulzer, Wolfgang; Phillips, Jeff M (Ed.)
    A (1+e)-stretch tree cover of a metric space is a collection of trees, where every pair of points has a (1+e)-stretch path in one of the trees. The celebrated Dumbbell Theorem [Arya et al. STOC'95] states that any set of n points in d-dimensional Euclidean space admits a (1+e)-stretch tree cover with O_d(e^{-d} ⋅ log(1/e)) trees, where the O_d notation suppresses terms that depend solely on the dimension d. The running time of their construction is O_d(n log n ⋅ log(1/e)/e^d + n ⋅ e^{-2d}). Since the same point may occur in multiple levels of the tree, the maximum degree of a point in the tree cover may be as large as Ω(log Φ), where Φ is the aspect ratio of the input point set. In this work we present a (1+e)-stretch tree cover with O_d(e^{-d+1} ⋅ log(1/e)) trees, which is optimal (up to the log(1/e) factor). Moreover, the maximum degree of points in any tree is an absolute constant for any d. As a direct corollary, we obtain an optimal {routing scheme} in low-dimensional Euclidean spaces. We also present a (1+e)-stretch Steiner tree cover (that may use Steiner points) with O_d(e^{(-d+1)/2} ⋅ log(1/e)) trees, which too is optimal. The running time of our two constructions is linear in the number of edges in the respective tree covers, ignoring an additive O_d(n log n) term; this improves over the running time underlying the Dumbbell Theorem. 
    more » « less
  4. null (Ed.)
    Directed Steiner Tree (DST) is a central problem in combinatorial optimization and theoretical computer science: Given a directed graph G = (V, E) with edge costs c ∈ ℝ_{≥ 0}^E, a root r ∈ V and k terminals K ⊆ V, we need to output a minimum-cost arborescence in G that contains an rrightarrow t path for every t ∈ K. Recently, Grandoni, Laekhanukit and Li, and independently Ghuge and Nagarajan, gave quasi-polynomial time O(log²k/log log k)-approximation algorithms for the problem, which are tight under popular complexity assumptions. In this paper, we consider the more general Degree-Bounded Directed Steiner Tree (DB-DST) problem, where we are additionally given a degree bound d_v on each vertex v ∈ V, and we require that every vertex v in the output tree has at most d_v children. We give a quasi-polynomial time (O(log n log k), O(log² n))-bicriteria approximation: The algorithm produces a solution with cost at most O(log nlog k) times the cost of the optimum solution that violates the degree constraints by at most a factor of O(log²n). This is the first non-trivial result for the problem. While our cost-guarantee is nearly optimal, the degree violation factor of O(log²n) is an O(log n)-factor away from the approximation lower bound of Ω(log n) from the Set Cover hardness. The hardness result holds even on the special case of the Degree-Bounded Group Steiner Tree problem on trees (DB-GST-T). With the hope of closing the gap, we study the question of whether the degree violation factor can be made tight for this special case. We answer the question in the affirmative by giving an (O(log nlog k), O(log n))-bicriteria approximation algorithm for DB-GST-T. 
    more » « less
  5. Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. 
    more » « less