This paper addresses the challenge of learning under arbitrary distribution shift, where models are trained on data from one distribution but evaluated on a potentially adversarially generated test distribution. The authors study two frameworks: PQ learning [Goldwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020], which allows abstention on adversarial portions of the test distribution, and TDS learning [Klivans, Stavropoulos, Vasilyan COLT 2024], which permits abstention on the entire test set if a distribution shift is detected. Previous algorithms either relied on computationally hard primitives (even for simple classes) or abstained entirely with even small shifts. This work develops efficient algorithms for natural function classes such as intersections of halfspaces and decision trees, under standard training distributions like Gaussians. The key innovation is an improved analysis of spectral outlier-removal techniques from learning with nasty noise, enabling: (1) handling arbitrarily large fractions of outliers, crucial for arbitrary shifts, and (2) stronger bounds on polynomial moments post outlier-removal, yielding new insights into polynomial regression under distribution shift. The techniques also produce novel results for tolerant testable learning [Rubinfeld and Vasilyan STOC 2023] and learning with nasty noise. 
                        more » 
                        « less   
                    This content will become publicly available on December 10, 2025
                            
                            Tolerant Algorithms for Learning with Arbitrary Covariate Shift
                        
                    
    
            We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: PQ learning [GKKM'20], allowing abstention on adversarially generated parts of the test distribution, and TDS learning [KSV'23], permitting abstention on the entire test distribution if distribution shift is detected. All prior known algorithms either rely on learning primitives that are computationally hard even for simple function classes, or end up abstaining entirely even in the presence of a tiny amount of distribution shift. We address both these challenges for natural function classes, including intersections of halfspaces and decision trees, and standard training distributions, including Gaussians. For PQ learning, we give efficient learning algorithms, while for TDS learning, our algorithms can tolerate moderate amounts of distribution shift. At the core of our approach is an improved analysis of spectral outlier-removal techniques from learning with nasty noise. Our analysis can (1) handle arbitrarily large fraction of outliers, which is crucial for handling arbitrary distribution shifts, and (2) obtain stronger bounds on polynomial moments of the distribution after outlier removal, yielding new insights into polynomial regression under distribution shifts. Lastly, our techniques lead to novel results for tolerant testable learning [RV'23], and learning with nasty noise. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2310818
- PAR ID:
- 10577205
- Publisher / Repository:
- Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
- Date Published:
- Format(s):
- Medium: X
- Location:
- Vancouver, BC, Canada
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The seminal work of Linial, Mansour, and Nisan gave a quasipolynomial-time algorithm for learning constant-depth circuits (𝖠𝖢0) with respect to the uniform distribution on the hypercube. Extending their algorithm to the setting of malicious noise, where both covariates and labels can be adversarially corrupted, has remained open. Here we achieve such a result, inspired by recent work on learning with distribution shift. Our running time essentially matches their algorithm, which is known to be optimal assuming various cryptographic primitives. Our proof uses a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called "nasty noise").more » « less
- 
            Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than 1/2 of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.more » « less
- 
            We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.more » « less
- 
            A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing localized discrepancy distance, where discrepancy is computed with respect to a fixed output classifier. These results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift (TDS learning) due to Klivans et al. (2023).Our approach generalizes and improves all prior work on TDS learning: (1) we obtain universal learners that succeed simultaneously for large classes of test distributions, (2) achieve near-optimal error rates, and (3) give exponential improvements for constant depth circuits. Our methods further extend to semi-parametric settings and imply the first positive results for low-dimensional convex sets. Additionally, we separate learning and testing phases and obtain algorithms that run in fully polynomial time at test time.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
