We study active learning methods for single index models of the form $$F({\bm x}) = f(\langle {\bm w}, {\bm x}\rangle)$$, where $$f:\mathbb{R} \to \mathbb{R}$$ and $${\bx,\bm w} \in \mathbb{R}^d$$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $$f$$ is known and Lipschitz, we show that $$\tilde{O}(d)$$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent $${O}(d^{2})$$ bound of \cite{gajjar2023active}. Second, we show that $$\tilde{O}(d)$$ samples suffice even in the more difficult setting when $$f$$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.
more »
« less
IMPROVED ACTIVE LEARNING VIA DEPENDENT LEVERAGE SCORE SAMPLING
We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non- independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to 50%. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided l∞ independence condition (which includes pivotal sampling) can actively learn d dimensional linear functions with O(d log d) samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under l∞ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on O(d) samples.
more »
« less
- Award ID(s):
- 2054306
- PAR ID:
- 10539458
- Publisher / Repository:
- The Twelfth International Conference on Learning Representations
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal O(n log n) sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using L^p-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS'13). The non-linearity of L^p-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the L^p-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph G(n,d/n), where the previously known algorithms run in time n^O(log d) or applied only to large d. We refine these algorithmic bounds significantly, and develop fast nearly linear algorithms based on Glauber dynamics that apply to all constant d, throughout the uniqueness regime.more » « less
-
Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal O(n log n) sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using L_p-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS’13). The non-linearity of L_p-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the L_p-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph G(n, d/n), where the previously known algorithms run in time n^O(log d) or applied only to large d. We refine these algorithmic bounds significantly, and develop fast nearly linear algorithms based on Glauber dynamics that apply to all constant d, throughout the uniqueness regime.more » « less
-
We consider the problem of active learning for single neuron models, also sometimes called “ridge functions”, in the agnostic setting (under adversarial label noise). Such models have been shown to be broadly effective in modeling physical phenomena, and for constructing surrogate data-driven models for partial differential equations. Surprisingly, we show that for a single neuron model with any Lipschitz non-linearity (such as the ReLU, sigmoid, absolute value, low-degree polynomial, among others), strong provable approximation guarantees can be obtained using a well-known active learning strategy for fitting linear functions in the agnostic setting. Namely, we can collect samples via statistical leverage score sampling, which has been shown to be nearoptimal in other active learning scenarios. We support our theoretical results with empirical simulations showing that our proposed active learning strategy based on leverage score sampling outperforms (ordinary) uniform sampling when fitting single neuron models.more » « less
-
Many machine learning problems optimize an objective that must be measured with noise. The primary method is a first order stochastic gradient descent using one or more Monte Carlo (MC) samples at each step. There are settings where ill-conditioning makes second order methods such as limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) more effective. We study the use of randomized quasi-Monte Carlo (RQMC) sampling for such problems. When MC sampling has a root mean squared error (RMSE) of O(n−1/2) then RQMC has an RMSE of o(n−1/2) that can be close to O(n−3/2) in favorable settings. We prove that improved sampling accuracy translates directly to improved optimization. In our empirical investigations for variational Bayes, using RQMC with stochastic quasi-Newton method greatly speeds up the optimization, and sometimes finds a better parameter value than MC does.more » « less