skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Active Tolerant Testing
In this work, we show that for a nontrivial hypothesis class C, we can estimate the distance of a target function f to C (estimate the error rate of the best h∈C) using substantially fewer labeled examples than would be needed to actually {\em learn} a good h∈C. Specifically, we show that for the class C of unions of d intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution D, we can estimate the error rate of the best h∈C to an additive error ϵ with a number of label requests that is {\em independent of d} and depends only on ϵ. In particular, we make O((1/ϵ^6)log(1/ϵ)) label queries to an unlabeled pool of size O((d/ϵ^2)log(1/ϵ)). This task of estimating the distance of an unknown f to a given class C is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than ϵ). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled and given as training data to A? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem).  more » « less
Award ID(s):
1525971
PAR ID:
10105979
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 31st Conference On Learning Theory
Page Range / eLocation ID:
474-497
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution D, unlabeled samples from test distribution D′, and the goal is to output a classifier with low error on D′ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on D′. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a 2(k/ϵ)O(1)poly(d)-time algorithm for TDS learning intersections of k homogeneous halfspaces to accuracy ϵ (prior work achieved d(k/ϵ)O(1)). We work under the mild assumption that the Gaussian training distribution contains at least an ϵ fraction of both positive and negative examples (ϵ-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the ϵ-balanced assumption is necessary for poly(d,1/ϵ)-time TDS learning for a single halfspace and (2) a dΩ~(log1/ϵ) lower bound for the intersection of two general halfspaces, even with the ϵ-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature. 
    more » « less
  2. Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution , unlabeled samples from test distribution ′, and the goal is to output a classifier with low error on ′ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on ′. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a 2(k/ϵ)O(1)𝗉𝗈𝗅𝗒(d)-time algorithm for TDS learning intersections of k homogeneous halfspaces to accuracy ϵ (prior work achieved d(k/ϵ)O(1)). We work under the mild assumption that the Gaussian training distribution contains at least an ϵ fraction of both positive and negative examples (ϵ-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the ϵ-balanced assumption is necessary for 𝗉𝗈𝗅𝗒(d,1/ϵ)-time TDS learning for a single halfspace and (2) a dΩ̃ (log1/ϵ) lower bound for the intersection of two general halfspaces, even with the ϵ-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature. 
    more » « less
  3. We consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data. We show that it is possible to accurately estimate this “learnability” even when given an amount of data that is too small to reliably learn any accurate model. Our first result applies to the setting where the data is drawn from a d-dimensional distribution with isotropic covariance, and the label of each datapoint is an arbitrary noisy function of the datapoint. In this setting, we show that with O(sqrt(d)) samples, one can accurately estimate the fraction of the variance of the label that can be explained via the best linear function of the data. For comparison, even if the labels are noiseless linear functions of the data, a sample size linear in the dimension, d, is required to learn any function correlated with the underlying model. Our estimation approach also applies to the setting where the data distribution has an (unknown) arbitrary covariance matrix, allowing these techniques to be applied to settings where the model class consists of a linear function applied to a nonlinear embedding of the data. In this setting we give a consistent estimator of the fraction of explainable variance that uses o(d) samples. Finally, our techniques also extend to the setting of binary classification, where we obtain analogous results under the logistic model, for estimating the classification accuracy of the best linear classifier. We demonstrate the practical viability of our approaches on synthetic and real data. This ability to estimate the explanatory value of a set of features (or dataset), even in the regime in which there is too little data to realize that explanatory value, may be relevant to the scientific and industrial settings for which data collection is expensive and there are many potentially relevant feature sets that could be collected. 
    more » « less
  4. Kohayakawa, Y.; Miyazawa, F.K. (Ed.)
    In this work we are interested in the problem of testing quantum entanglement. More specifically, we study the separability problem in quantum property testing, where one is given n copies of an unknown mixed quantum state ϱ on Cd⊗Cd , and one wants to test whether ϱ is separable or ϵ -far from all separable states in trace distance. We prove that n=Ω(d2/ϵ2) copies are necessary to test separability, assuming ϵ is not too small, viz. ϵ=Ω(1/d−−√) . We also study completely positive distributions on the grid [d]×[d] , as a classical analogue of separable states. We analogously prove that Ω(d/ϵ2) samples from an unknown distribution p are necessary to decide whether p is completely positive or ϵ -far from all completely positive distributions in total variation distance. 
    more » « less
  5. Label efficiency has become an increasingly important objective in deep learning applications. Active learning aims to reduce the number of labeled examples needed to train deep networks, but the empirical performance of active learning algorithms can vary dramatically across datasets and applications. It is difficult to know in advance which active learning strategy will perform well or best in a given application. To address this, we propose the first adaptive algorithm selection strategy for deep active learning. For any unlabeled dataset, our (meta) algorithm TAILOR (Thompson ActIve Learning algORithm selection) iteratively and adap- tively chooses among a set of candidate active learning algorithms. TAILOR uses novel reward functions aimed at gathering class-balanced examples. Extensive experiments in multi-class and multi-label applications demonstrate TAILOR ’s effectiveness in achieving accuracy comparable or better than that of the best of the candidate algorithms. Our implementation of TAILOR is open-sourced at https://github.com/jifanz/TAILOR. 
    more » « less