skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Constants Matter: The Performance Gains of Active Learning
Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.  more » « less
Award ID(s):
1813160
PAR ID:
10343009
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 39th International Conference on Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non- independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to 50%. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided l∞ independence condition (which includes pivotal sampling) can actively learn d dimensional linear functions with O(d log d) samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under l∞ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on O(d) samples. 
    more » « less
  2. We study the properties of output distributions of noisy random circuits. We obtain upper and lower bounds on the expected distance of the output distribution from the “useless” uniform distribution. These bounds are tight with respect to the dependence on circuit depth. Our proof techniques also allow us to make statements about the presence or absence of anticoncentration for both noisy and noiseless circuits. We uncover a number of interesting consequences for hardness proofs of sampling schemes that aim to show a quantum computational advantage over classical computation. Specifically, we discuss recent barrier results for depth-agnostic and/or noise-agnostic proof techniques. We show that in certain depth regimes, noise-agnostic proof techniques might still work in order to prove an often-conjectured claim in the literature on quantum computational advantage, contrary to what has been thought prior to this work. 
    more » « less
  3. Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a number of recent works~\cite{chen2022,chen2022improved,benton2023linear} have studied the iteration complexity of sampling, assuming access to an accurate diffusion model. In this work, we focus on understanding the \emph{sample complexity} of training such a model; how many samples are needed to learn an accurate diffusion model using a sufficiently expressive neural network? Prior work~\cite{BMR20} showed bounds polynomial in the dimension, desired Total Variation error, and Wasserstein error. We show an \emph{exponential improvement} in the dependence on Wasserstein error and depth, along with improved dependencies on other relevant parameters. 
    more » « less
  4. We study robust testing and estimation of discrete distributions in the strong contamination model. Our results cover both centralized setting and distributed setting with general local information constraints including communication and LDP constraints. Our technique relates the strength of manipulation attacks to the earth-mover distance using Hamming distance as the metric between messages (samples) from the users. In the centralized setting, we provide optimal error bounds for both learning and testing. Our lower bounds under local information constraints build on the recent lower bound methods in distributed inference. In the communication constrained setting, we develop novel algorithms based on random hashing and an L1-L1 isometry. 
    more » « less
  5. The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction and incur an error that is marginally smaller than random guessing. We develop the first computationally efficient active learning algorithm with abstention. Our algorithm provably achieves p o l y l o g ( 1 ε ) label complexity, without any low noise conditions. Such performance guarantee reduces the label complexity by an exponential factor, relative to passive learning and active learning that is not allowed to abstain. Furthermore, our algorithm is guaranteed to only abstain on hard examples (where the true label distribution is close to a fair coin), a novel property we term \emph{proper abstention} that also leads to a host of other desirable characteristics (e.g., recovering minimax guarantees in the standard setting, and avoiding the undesirable noise-seeking'' behavior often seen in active learning). We also provide novel extensions of our algorithm that achieve \emph{constant} label complexity and deal with model misspecification. 
    more » « less