skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 2 until 12:00 AM ET on Saturday, May 3 due to maintenance. We apologize for the inconvenience.


Title: Chernoff Sampling for Active Testing and Extension to Active Regression
Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model. In this paper, we revisit the work of Chernoff that described an asymptotically optimal algorithm for performing a hypothesis test. We obtain a novel sample complexity bound for Chernoff’s algorithm, with a non-asymptotic term that characterizes its performance at a fixed confidence level. We also develop an extension of Chernoff sampling that can be used to estimate the parameters of a wide variety of models and we obtain a non-asymptotic bound on the estimation error. We apply our extension of Chernoff sampling to actively learn neural network models and to estimate parameters in real-data linear and non-linear regression problems, where our approach performs favorably to state-of-the-art methods.  more » « less
Award ID(s):
2023239
PAR ID:
10533094
Author(s) / Creator(s):
; ;
Publisher / Repository:
International Conference on Artificial Intelligence and Statistics
Date Published:
Format(s):
Medium: X
Location:
Valencia, Spain
Sponsoring Org:
National Science Foundation
More Like this
  1. We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non- independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to 50%. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided l∞ independence condition (which includes pivotal sampling) can actively learn d dimensional linear functions with O(d log d) samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under l∞ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound on O(d) samples. 
    more » « less
  2. This paper considers the problem of testing whether there exists a non‐negative solution to a possibly under‐determined system of linear equations with known coefficients. This hypothesis testing problem arises naturally in a number of settings, including random coefficient, treatment effect, and discrete choice models, as well as a class of linear programming problems. As a first contribution, we obtain a novel geometric characterization of the null hypothesis in terms of identified parameters satisfying an infinite set of inequality restrictions. Using this characterization, we devise a test that requires solving only linear programs for its implementation, and thus remains computationally feasible in the high‐dimensional applications that motivate our analysis. The asymptotic size of the proposed test is shown to equal at most the nominal level uniformly over a large class of distributions that permits the number of linear equations to grow with the sample size. 
    more » « less
  3. We study stochastic approximation procedures for approximately solving a $$d$$-dimensional linear fixed point equation based on observing a trajectory of length $$n$$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $$t_{\mathrm{mix}} \tfrac{d}{n}$$ on the squared error of the last iterate of a standard scheme, where $$t_{\mathrm{mix}}$$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $$(d, t_{\mathrm{mix}})$$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the TD($$\lambda$$) family of algorithms for all $$\lambda \in [0, 1)$$—and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $$\lambda$$ when running the TD($$\lambda$$) algorithm). 
    more » « less
  4. Firoozi, R.; Mehr, N.; Yel, E.; Antonova, R; Bohg, J.; Schwager, M.; Kochenderfer, M. (Ed.)
    This work considers the problem of learning the Markov parameters of a linear system from ob- served data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and multi-rollout setting. In both instances, the number of samples required in order to obtain acceptable estimates can produce optimization problems with an intractably large number of decision variables for a second-order algorithm. We show that a randomized and distributed Newton algorithm based on Hessian-sketching can produce ε-optimal solutions and converges geometrically. Moreover, the algorithm is trivially parallelizable. Our re- sults hold for a variety of sketching matrices and we illustrate the theory with numerical examples. 
    more » « less
  5. We study optimal design problems in which the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector. We study the [Formula: see text]-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. We introduce the proportional volume sampling algorithm to obtain nearly optimal bounds in the asymptotic regime when the number [Formula: see text] of measurements made is significantly larger than the dimension [Formula: see text] and obtain the first approximation algorithms whose approximation factor does not degrade with the number of possible measurements when [Formula: see text] is small. The algorithm also gives approximation guarantees for other optimal design objectives such as [Formula: see text]-optimality and the generalized ratio objective, matching or improving the previously best-known results. We further show that bounds similar to ours cannot be obtained for [Formula: see text]-optimal design and that [Formula: see text]-optimal design is NP-hard to approximate within a fixed constant when [Formula: see text]. 
    more » « less