Traditional models of supervised learning require a learner, given examples from an arbitrary joint distribution on ๐
๐ ร { ยฑ 1 } R d ร{ยฑ1}, to output a hypothesis that competes (to within ๐ ฯต) with the best fitting concept from a class. To overcome hardness results for learning even simple concept classes, this paper introduces a smoothed-analysis framework that only requires competition with the best classifier robust to small random Gaussian perturbations. This subtle shift enables a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (multi-index model) and (2) has bounded Gaussian surface area. This class includes functions of halfspaces and low-dimensional convex sets, which are only known to be learnable in non-smoothed settings with respect to highly structured distributions like Gaussians. The analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, the authors present the first algorithm for agnostically learning intersections of ๐ k-halfspaces in time ๐ โ
poly ( log โก ๐ , ๐ , ๐พ ) kโ
poly(logk,ฯต,ฮณ), where ๐พ ฮณ is the margin parameter. Previously, the best-known runtime was exponential in ๐ k (Arriaga and Vempala, 1999).
more »
« less
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension
In the well-studied agnostic model of learning, the goal of a learnerโ given examples from an arbitrary joint distribution โ is to output a hypothesis that is competitive (to within ๐) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of ๐ -halfspaces in time ๐\poly(log๐๐๐พ) where ๐พ is the margin parameter. Before our work, the best-known runtime was exponential in ๐ (Arriaga and Vempala, 1999).
more »
« less
- Award ID(s):
- 1909204
- PAR ID:
- 10563571
- Publisher / Repository:
- Conference on Learning Theory 2024
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Under Review for COLT 2024 (Ed.)In the well-studied agnostic model of learning, the goal of a learnerโ given examples from an arbitrary joint distribution on Rd โฅ {ยฑ1}โ is to output a hypothesis that is competitive (to within โ) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frame- works such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly( log k โ ) where is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999a).more » « less
-
In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on โdร{ยฑ1} -- is to output a hypothesis that is competitive (to within ฯต) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly(logkฯตฮณ) where ฮณ is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999).more » « less
-
We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution D, unlabeled samples from test distribution Dโฒ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between D and Dโฒ. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from D and Dโฒ pass an associated test; moreover, the test must accept if the marginal of D equals the marginal of Dโฒ. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of D is Gaussian or uniform on {ยฑ1}d. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on Dโฒ. For halfspaces in the realizable case (where there exists a halfspace consistent with both D and Dโฒ), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree L2-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.more » « less
-
We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution D, unlabeled samples from test distribution Dโ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between D and Dโ. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from D and Dโ pass an associated test; moreover, the test must accept (with high probability) if the marginal of D equals the marginal of Dโ. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of D is Gaussian or uniform on the hypercube. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on Dโ. For halfspaces in the realizable case (where there exists a halfspace consistent with both D and Dโ), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree L2-sandwiching polynomial approximators can be learned in our model. Since we require L2- sandwiching (instead of the usual L1 loss), we cannot directly appeal to convex duality and instead apply constructions from the pseudorandomness literature to obtain the required approximators. We also provide lower bounds to show that the guarantees we obtain on the performance of our output hypotheses are best possible up to constant factors, as well as a separation showing that realizable learning in our model is incomparable to (ordinary) agnostic learning.more » « less
An official website of the United States government

