skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On The Effectiveness of Active Learning by Uncertainty Sampling in Classification of High-Dimensional Gaussian Mixture Data
Active learning aims to reduce the cost of labeling through selective sampling. Despite reported empirical success over passive learning, many popular active learning heuristics such as uncertainty sampling still lack satisfying theoretical guarantees. Towards closing the gap between practical use and theoretical understanding in active learning, we propose to characterize the exact behavior of uncertainty sampling for high-dimensional Gaussian mixture data, in a modern regime of big data where the numbers of samples and features are commensurately large. Through a sharp characterization of the learning results, our analysis sheds light on the important question of when uncertainty sampling works better than passive learning. Our results show that the effectiveness of uncertainty sampling is not always ensured. In fact it depends crucially on the choice of i) an adequate initial classifier used to start the active sampling process and ii) a proper loss function that allows an adaptive treatment of samples queried at various steps.  more » « less
Award ID(s):
1813877 1846369
PAR ID:
10483554
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
978-1-6654-0540-9
Page Range / eLocation ID:
4238 to 4242
Format(s):
Medium: X
Location:
Singapore, Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non- independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to 50%. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided l∞ independence condition (which includes pivotal sampling) can actively learn d dimensional linear functions with O(d log d) samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under l∞ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on O(d) samples. 
    more » « less
  2. Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants. 
    more » « less
  3. We consider the problem of active learning for single neuron models, also sometimes called “ridge functions”, in the agnostic setting (under adversarial label noise). Such models have been shown to be broadly effective in modeling physical phenomena, and for constructing surrogate data-driven models for partial differential equations. Surprisingly, we show that for a single neuron model with any Lipschitz non-linearity (such as the ReLU, sigmoid, absolute value, low-degree polynomial, among others), strong provable approximation guarantees can be obtained using a well-known active learning strategy for fitting linear functions in the agnostic setting. Namely, we can collect samples via statistical leverage score sampling, which has been shown to be nearoptimal in other active learning scenarios. We support our theoretical results with empirical simulations showing that our proposed active learning strategy based on leverage score sampling outperforms (ordinary) uniform sampling when fitting single neuron models. 
    more » « less
  4. We study active learning methods for single index models of the form $$F({\bm x}) = f(\langle {\bm w}, {\bm x}\rangle)$$, where $$f:\mathbb{R} \to \mathbb{R}$$ and $${\bx,\bm w} \in \mathbb{R}^d$$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $$f$$ is known and Lipschitz, we show that $$\tilde{O}(d)$$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent $${O}(d^{2})$$ bound of \cite{gajjar2023active}. Second, we show that $$\tilde{O}(d)$$ samples suffice even in the more difficult setting when $$f$$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions. 
    more » « less
  5. Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In "open world" settings, the classes of interest can make up a small fraction of the overall dataset -- most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY's superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets. 
    more » « less