skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 30, 2026

Title: Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension
Traditional models of supervised learning require a learner, given examples from an arbitrary joint distribution on 𝑅 𝑑 Γ— { Β± 1 } R d Γ—{Β±1}, to output a hypothesis that competes (to within πœ– Ο΅) with the best fitting concept from a class. To overcome hardness results for learning even simple concept classes, this paper introduces a smoothed-analysis framework that only requires competition with the best classifier robust to small random Gaussian perturbations. This subtle shift enables a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (multi-index model) and (2) has bounded Gaussian surface area. This class includes functions of halfspaces and low-dimensional convex sets, which are only known to be learnable in non-smoothed settings with respect to highly structured distributions like Gaussians. The analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, the authors present the first algorithm for agnostically learning intersections of π‘˜ k-halfspaces in time π‘˜ β‹… poly ( log ⁑ π‘˜ , πœ– , 𝛾 ) kβ‹…poly(logk,Ο΅,Ξ³), where 𝛾 Ξ³ is the margin parameter. Previously, the best-known runtime was exponential in π‘˜ k (Arriaga and Vempala, 1999).  more » « less
Award ID(s):
2505865
PAR ID:
10631922
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
https://doi.org/10.48550/arXiv.2407.00966
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the well-studied agnostic model of learning, the goal of a learner– given examples from an arbitrary joint distribution – is to output a hypothesis that is competitive (to within πœ–) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of π‘˜ -halfspaces in time π‘˜\poly(logπ‘˜πœ–π›Ύ) where 𝛾 is the margin parameter. Before our work, the best-known runtime was exponential in π‘˜ (Arriaga and Vempala, 1999). 
    more » « less
  2. In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on ℝdΓ—{Β±1} -- is to output a hypothesis that is competitive (to within Ο΅) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly(logkϡγ) where Ξ³ is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999). 
    more » « less
  3. Under Review for COLT 2024 (Ed.)
    In the well-studied agnostic model of learning, the goal of a learner– given examples from an arbitrary joint distribution on Rd β‡₯ {Β±1}– is to output a hypothesis that is competitive (to within ✏) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frame- works such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly( log k ✏ ) where is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999a). 
    more » « less
  4. Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution Dβ€², and the goal is to output a classifier with low error on Dβ€² whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on Dβ€². Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a 2(k/Ο΅)O(1)poly(d)-time algorithm for TDS learning intersections of k homogeneous halfspaces to accuracy Ο΅ (prior work achieved d(k/Ο΅)O(1)). We work under the mild assumption that the Gaussian training distribution contains at least an Ο΅ fraction of both positive and negative examples (Ο΅-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the Ο΅-balanced assumption is necessary for poly(d,1/Ο΅)-time TDS learning for a single halfspace and (2) a dΞ©~(log1/Ο΅) lower bound for the intersection of two general halfspaces, even with the Ο΅-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature. 
    more » « less
  5. This paper studies differentially private stochastic convex optimization (DP-SCO) in the presence of heavy-tailed gradients, where only a π‘˜ kth-moment bound on sample Lipschitz constants is assumed, instead of a uniform bound. The authors propose a reduction-based approach that achieves the first near-optimal error rates (up to logarithmic factors) in this setting. Specifically, under ( πœ– , 𝛿 ) (Ο΅,Ξ΄)-approximate differential privacy, they achieve an error bound of 𝐺 2 𝑛 + 𝐺 π‘˜ β‹… ( 𝑑 𝑛 πœ– ) 1 βˆ’ 1 π‘˜ , n ​ G 2 ​ ​ +G k ​ β‹…( nΟ΅ d ​ ​ ) 1βˆ’ k 1 ​ , up to a mild polylogarithmic factor in 1 𝛿 Ξ΄ 1 ​ , where 𝐺 2 G 2 ​ and 𝐺 π‘˜ G k ​ are the 2nd and π‘˜ kth moment bounds on sample Lipschitz constants. This nearly matches the lower bound established by Lowy and Razaviyayn (2023). Beyond the basic result, the authors introduce a suite of private algorithms that further improve performance under additional assumptions: an optimal algorithm under a known-Lipschitz constant, a near-linear time algorithm for smooth functions, and an optimal linear-time algorithm for smooth generalized linear models. 
    more » « less