skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Agnostic Active Learning of Single Index Models with Linear Sample Complexity
We study active learning methods for single index models of the form $$F({\bm x}) = f(\langle {\bm w}, {\bm x}\rangle)$$, where $$f:\mathbb{R} \to \mathbb{R}$$ and $${\bx,\bm w} \in \mathbb{R}^d$$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $$f$$ is known and Lipschitz, we show that $$\tilde{O}(d)$$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent $${O}(d^{2})$$ bound of \cite{gajjar2023active}. Second, we show that $$\tilde{O}(d)$$ samples suffice even in the more difficult setting when $$f$$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.  more » « less
Award ID(s):
2045590
PAR ID:
10576146
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Proceedings of Machine Learning Research
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of active learning for single neuron models, also sometimes called “ridge functions”, in the agnostic setting (under adversarial label noise). Such models have been shown to be broadly effective in modeling physical phenomena, and for constructing surrogate data-driven models for partial differential equations. Surprisingly, we show that for a single neuron model with any Lipschitz non-linearity (such as the ReLU, sigmoid, absolute value, low-degree polynomial, among others), strong provable approximation guarantees can be obtained using a well-known active learning strategy for fitting linear functions in the agnostic setting. Namely, we can collect samples via statistical leverage score sampling, which has been shown to be nearoptimal in other active learning scenarios. We support our theoretical results with empirical simulations showing that our proposed active learning strategy based on leverage score sampling outperforms (ordinary) uniform sampling when fitting single neuron models. 
    more » « less
  2. We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non- independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to 50%. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided l∞ independence condition (which includes pivotal sampling) can actively learn d dimensional linear functions with O(d log d) samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under l∞ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on O(d) samples. 
    more » « less
  3. We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $$(1\pm \epsilon)$$ relative error with $$\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$$ points, where $$\mu_y(X)$$ is a natural complexity measure of the data matrix $$X \in \mathbb{R}^{n \times d}$$ and label vector $$y \in \{-1,1\}^n$$, introduced in Munteanu et al. 2018. Our result is based on subsampling data points with probabilities proportional to their \textit{$$\ell_1$$ Lewis weights}. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios. 
    more » « less
  4. Abstract We consider the problem of computing the partition function $$\sum _x e^{f(x)}$$ , where $$f: \{-1, 1\}^n \longrightarrow {\mathbb R}$$ is a quadratic or cubic polynomial on the Boolean cube $$\{-1, 1\}^n$$ . In the case of a quadratic polynomial f , we show that the partition function can be approximated within relative error $$0 < \epsilon < 1$$ in quasi-polynomial $$n^{O(\ln n - \ln \epsilon )}$$ time if the Lipschitz constant of the non-linear part of f with respect to the $$\ell ^1$$ metric on the Boolean cube does not exceed $$1-\delta $$ , for any $$\delta>0$$ , fixed in advance. For a cubic polynomial f , we get the same result under a somewhat stronger condition. We apply the method of polynomial interpolation, for which we prove that $$\sum _x e^{\tilde {f}(x)} \ne 0$$ for complex-valued polynomials $$\tilde {f}$$ in a neighborhood of a real-valued f satisfying the above mentioned conditions. The bounds are asymptotically optimal. Results on the zero-free region are interpreted as the absence of a phase transition in the Lee–Yang sense in the corresponding Ising model. The novel feature of the bounds is that they control the total interaction of each vertex but not every single interaction of sets of vertices. 
    more » « less
  5. Abstract We investigate the validity of the “Einstein relations” in the general setting of unimodular random networks. These are equalities relating scaling exponents:$$\begin{aligned} d_{w} &= d_{f} + \tilde{\zeta }, \\ d_{s} &= 2 d_{f}/d_{w}, \end{aligned}$$wheredwis the walk dimension,dfis the fractal dimension,dsis the spectral dimension, and$$\tilde{\zeta }$$is the resistance exponent. Roughly speaking, this relates the mean displacement and return probability of a random walker to the density and conductivity of the underlying medium. We show that ifdfand$$\tilde{\zeta } \geqslant 0$$exist, thendwanddsexist, and the aforementioned equalities hold. Moreover, our primary new estimate$$d_{w} \geqslant d_{f} + \tilde{\zeta }$$is established for all$$\tilde{\zeta } \in \mathbb{R}$$. For the uniform infinite planar triangulation (UIPT), this yields the consequencedw=4 usingdf=4 (Angel in Geom. Funct. Anal. 13(5):935–974, 2003) and$$\tilde{\zeta }=0$$(established here as a consequence of the Liouville Quantum Gravity theory, following Gwynne-Miller 2020 and (Ding and Gwynne in Commun. Math. Phys. 374(3):1877–1934, 2020)). The conclusiondw=4 had been previously established by Gwynne and Hutchcroft (2018) using more elaborate methods. A new consequence is thatdw=dffor the uniform infinite Schnyder-wood decorated triangulation, implying that the simple random walk is subdiffusive, sincedf>2. 
    more » « less