In the well-studied agnostic model of learning, the goal of a learnerβ given examples from an arbitrary joint distribution β is to output a hypothesis that is competitive (to within π) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of π -halfspaces in time π\poly(logπππΎ) where πΎ is the margin parameter. Before our work, the best-known runtime was exponential in π (Arriaga and Vempala, 1999). 
                        more » 
                        « less   
                    This content will become publicly available on April 30, 2026
                            
                            Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension
                        
                    
    
            Traditional models of supervised learning require a learner, given examples from an arbitrary joint distribution on π
 π Γ { Β± 1 } R d Γ{Β±1}, to output a hypothesis that competes (to within π Ο΅) with the best fitting concept from a class. To overcome hardness results for learning even simple concept classes, this paper introduces a smoothed-analysis framework that only requires competition with the best classifier robust to small random Gaussian perturbations. This subtle shift enables a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (multi-index model) and (2) has bounded Gaussian surface area. This class includes functions of halfspaces and low-dimensional convex sets, which are only known to be learnable in non-smoothed settings with respect to highly structured distributions like Gaussians. The analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, the authors present the first algorithm for agnostically learning intersections of π k-halfspaces in time π β
 poly ( log β‘ π , π , πΎ ) kβ
poly(logk,Ο΅,Ξ³), where πΎ Ξ³ is the margin parameter. Previously, the best-known runtime was exponential in π k (Arriaga and Vempala, 1999). 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2505865
- PAR ID:
- 10631922
- Publisher / Repository:
- https://doi.org/10.48550/arXiv.2407.00966
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on βdΓ{Β±1} -- is to output a hypothesis that is competitive (to within Ο΅) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly(logkϡγ) where Ξ³ is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999).more » « less
- 
            Under Review for COLT 2024 (Ed.)In the well-studied agnostic model of learning, the goal of a learnerβ given examples from an arbitrary joint distribution on Rd β₯ {Β±1}β is to output a hypothesis that is competitive (to within β) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frame- works such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly( log k β ) where is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999a).more » « less
- 
            Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution Dβ², and the goal is to output a classifier with low error on Dβ² whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on Dβ². Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a 2(k/Ο΅)O(1)poly(d)-time algorithm for TDS learning intersections of k homogeneous halfspaces to accuracy Ο΅ (prior work achieved d(k/Ο΅)O(1)). We work under the mild assumption that the Gaussian training distribution contains at least an Ο΅ fraction of both positive and negative examples (Ο΅-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the Ο΅-balanced assumption is necessary for poly(d,1/Ο΅)-time TDS learning for a single halfspace and (2) a dΞ©~(log1/Ο΅) lower bound for the intersection of two general halfspaces, even with the Ο΅-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.more » « less
- 
            This paper studies differentially private stochastic convex optimization (DP-SCO) in the presence of heavy-tailed gradients, where only a π kth-moment bound on sample Lipschitz constants is assumed, instead of a uniform bound. The authors propose a reduction-based approach that achieves the first near-optimal error rates (up to logarithmic factors) in this setting. Specifically, under ( π , πΏ ) (Ο΅,Ξ΄)-approximate differential privacy, they achieve an error bound of πΊ 2 π + πΊ π β ( π π π ) 1 β 1 π , n β G 2 β β +G k β β ( nΟ΅ d β β ) 1β k 1 β , up to a mild polylogarithmic factor in 1 πΏ Ξ΄ 1 β , where πΊ 2 G 2 β and πΊ π G k β are the 2nd and π kth moment bounds on sample Lipschitz constants. This nearly matches the lower bound established by Lowy and Razaviyayn (2023). Beyond the basic result, the authors introduce a suite of private algorithms that further improve performance under additional assumptions: an optimal algorithm under a known-Lipschitz constant, a near-linear time algorithm for smooth functions, and an optimal linear-time algorithm for smooth generalized linear models.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
