skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 16, 2026

Title: Learning Noisy Halfspaces with a Margin: Massart is No Harder than Random
We study the problem of PAC learning γ-margin halfspaces with Massart noise. We propose a simple proper learning algorithm, the Perspectron, that has sample complexity O˜((ϵγ)−2) and achieves classification error at most η+ϵ where η is the Massart noise rate. Prior works [DGT19,CKMY20] came with worse sample complexity guarantees (in both ϵ and γ) or could only handle random classification noise [DDK+23,KIT+23] -- a much milder noise assumption. We also show that our results extend to the more challenging setting of learning generalized linear models with a known link function under Massart noise, achieving a similar sample complexity to the halfspace case. This significantly improves upon the prior state-of-the-art in this setting due to [CKMY20], who introduced this model.  more » « less
Award ID(s):
2505865
PAR ID:
10631082
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
NeurIPS 2024 https://arxiv.org/abs/2501.09851
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Motivated by estimation of quantum noise models, we study the problem of learning a Pauli channel, or more generally the Pauli error rates of an arbitrary channel. By employing a novel reduction to the "Population Recovery" problem, we give an extremely simple algorithm that learns the Pauli error rates of an n -qubit channel to precision ϵ in ℓ ∞ using just O ( 1 / ϵ 2 ) log ⁡ ( n / ϵ ) applications of the channel. This is optimal up to the logarithmic factors. Our algorithm uses only unentangled state preparation and measurements, and the post-measurement classical runtime is just an O ( 1 / ϵ ) factor larger than the measurement data size. It is also impervious to a limited model of measurement noise where heralded measurement failures occur independently with probability ≤ 1 / 4 .We then consider the case where the noise channel is close to the identity, meaning that the no-error outcome occurs with probability 1 − η . In the regime of small η we extend our algorithm to achieve multiplicative precision 1 ± ϵ (i.e., additive precision ϵ η ) using just O ( 1 ϵ 2 η ) log ⁡ ( n / ϵ ) applications of the channel. 
    more » « less
  2. In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension d, m tasks with k features up to error ϵ is O~(dk1.5/ϵ+km/ϵ). We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods. 
    more » « less
  3. null (Ed.)
    Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then consider the communication cost of collaborative learning in the presence of classification noise. As an intermediate step, we show how collaborative PAC learning algorithms can be adapted to handle classification noise. With this insight, we develop communication efficient algorithms for collaborative PAC learning robust to classification noise. 
    more » « less
  4. Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the global convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis applies to very general settings, as we only assume ergodicity of the underlying Markov decision process. In order to ensure enough exploration, we employ an ϵ-greedy sampling of the trajectory. For a fixed and small enough exploration parameter ϵ, we show that the two-time-scale natural actor-critic algorithm has a rate of convergence of O~(1/T1/4), where T is the number of samples, and this leads to a sample complexity of O~(1/δ8) samples to find a policy that is within an error of δ from the global optimum. Moreover, by carefully decreasing the exploration parameter ϵ as the iterations proceed, we present an improved sample complexity of O~(1/δ6) for convergence to the global optimum. 
    more » « less
  5. There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1-distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵ-close to uniform from the case where the distribution is (1−ϵ)-far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closeness testing problems leads to the following mixture testing question: Given samples of distributions p,q1,q2, can we test if p is a mixture of q1 and q2? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case. 
    more » « less