skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Induced Model Matching: Restricted Models Help Train Full-Featured Models
We consider scenarios where a very accurate (often small) predictive model using restricted features is available when training a full-featured (often larger) model. This restricted model may be thought of as "side-information", and can come either from an auxiliary dataset or from the same dataset by forcing the restriction. How can the restricted model be useful to the full model? To answer this, we introduce a methodology called Induced Model Matching (IMM). IMM aligns the context-restricted, or induced, version of the large model with the restricted model. We relate IMM to approaches such as noising, which is implicit in addressing the problem, and reverse knowledge distillation from weak teachers, which is explicit but does not exploit restriction being the nature of the weakness. We show that these prior methods can be thought of as approximations to IMM and can be problematic in terms of consistency. Experimentally, we first motivate IMM using logistic regression as a toy example. We then explore it in language modeling, the application that initially inspired it, and demonstrate it on both LSTM and transformer full models, using bigrams as restricted models. We lastly give a simple RL example, which shows that POMDP policies can help learn better MDP policies. The IMM principle is thus generally applicable in common scenarios where restricted data is cheaper to collect or restricted models are easier to learn.  more » « less
Award ID(s):
2146334 2217023
PAR ID:
10626919
Author(s) / Creator(s):
;
Publisher / Repository:
Advances in Neural Information Processing Systems (NeurIPS)
Date Published:
Volume:
37
ISBN:
9798331314385
Page Range / eLocation ID:
62617-62647
Format(s):
Medium: X
Location:
Vancouver, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract—In many scenarios for informative path planning done by ground robots or drones, certain types of information are significantly more valuable than others. For example, in the precision agriculture context, detecting plant disease outbreaks can prevent costly crop losses. Quite often, there is a limit on the exploration budget, which does not allow for a detailed investigation of every location. In this paper, we propose Learned Adaptive Inspection Paths (LAIP), a methodology to learn policies that handle such scenarios by combining uniform sampling with close inspection of areas where high-value information is likely to be found. LAIP combines Q-learning in an offline reinforcement learning setting, careful engineering of the state representation and reward system, and a training regime inspired by the teacher-student curriculum learning model. We found that a policy learned with LAIP outperforms traditional approaches in low-budget scenarios. 
    more » « less
  2. In this paper, we address clustering problems in scenarios where accurate direct access to the full dataset is impractical or impossible. Instead, we leverage oracle-based methods, which are particularly valuable in real-world applications where the data may be noisy, restricted due to privacy concerns or sheer volume. We utilize two oracles, the quadruplet and the distance oracle. The quadruplet oracle is a weaker oracle that only approximately compares the distances of two pairs of vertices. In practice, these oracles can be implemented using crowdsourcing or training classifiers or other predictive models. On the other hand, the distance oracle returns exactly the distance of two vertices, so it is a stronger and more expensive oracle to implement. We consider two noise models for the quadruplet oracle. In the adversarial noise model, if two pairs have similar distances, the response is chosen by an adversary. In the probabilistic noise model, the pair with the smaller distance is returned with a constant probability. We consider a set V of n vertices in a metric space that supports the quadruplet and the distance oracle. For each of the k-center, k-median, and k-means clustering problem on V, we design constant approximation algorithms that perform roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle in both noise models. When the dataset has low intrinsic dimension, we significantly improve the approximation factors of our algorithms by performing a few additional calls to the distance oracle. We also show that for k-median and k-means clustering there is no hope to return any sublinear approximation using only the quadruplet oracle. Finally, we give constant approximation algorithms for estimating the clustering cost induced by any set of k vertices, performing roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle. 
    more » « less
  3. Three scenarios of “low”, “medium”, and “high” levels of restriction on groundwater are developed. This dataset includes likely groundwater sustainability restriction policies (GSPs) considering 2010 levels 
    more » « less
  4. Abstract We review some simulation‐based methods to implement optimal decisions in sequential design problems as they naturally arise in clinical trial design. As a motivating example we use a stylized version of a dose‐ranging design in the ASTIN trial. The approach can be characterized as constrained backward induction. The nature of the constraint is a restriction of the decisions to a set of actions that are functions of the current history only implicitly through a low‐dimensional summary statistic. In addition, the action set is restricted to time‐invariant policies. Time‐dependence is only introduced indirectly through the change of the chosen summary statistic over time. This restriction allows computationally efficient solutions to the sequential decision problem. A further simplification is achieved by restricting optimal actions to be described by decision boundaries on the space of such summary statistics. 
    more » « less
  5. Abstract Exponential random graph models, or ERGMs, are a flexible and general class of models for modeling dependent data. While the early literature has shown them to be powerful in capturing many network features of interest, recent work highlights difficulties related to the models’ ill behavior, such as most of the probability mass being concentrated on a very small subset of the parameter space. This behavior limits both the applicability of an ERGM as a model for real data and inference and parameter estimation via the usual Markov chain Monte Carlo algorithms. To address this problem, we propose a new exponential family of models for random graphs that build on the standard ERGM framework. Specifically, we solve the problem of computational intractability and “degenerate” model behavior by an interpretable support restriction. We introduce a new parameter based on the graph-theoretic notion of degeneracy, a measure of sparsity whose value is commonly low in real-world networks. The new model family is supported on the sample space of graphs with bounded degeneracy and is called degeneracy-restricted ERGMs, or DERGMs for short. Since DERGMs generalize ERGMs—the latter is obtained from the former by setting the degeneracy parameter to be maximal—they inherit good theoretical properties, while at the same time place their mass more uniformly over realistic graphs. The support restriction allows the use of new (and fast) Monte Carlo methods for inference, thus making the models scalable and computationally tractable. We study various theoretical properties of DERGMs and illustrate how the support restriction improves the model behavior. We also present a fast Monte Carlo algorithm for parameter estimation that avoids many issues faced by Markov Chain Monte Carlo algorithms used for inference in ERGMs. 
    more » « less