skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on July 21, 2025

Title: Transferring Knowledge from Large Foundation Models to Small Downstream Models
Award ID(s):
2118310
PAR ID:
10539305
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
International Conference on Machine Learning
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Different agents need to make a prediction. They observe identical data, but have different models: they predict using different explanatory variables. We study which agent believes they have the best predictive ability—as measured by the smallest subjective posterior mean squared prediction error—and show how it depends on the sample size. With small samples, we present results suggesting it is an agent using a low-dimensional model. With large samples, it is generally an agent with a high-dimensional model, possibly including irrelevant variables, but never excluding relevant ones. We apply our results to characterize the winning model in an auction of productive assets, to argue that entrepreneurs and investors with simple models will be overrepresented in new sectors, and to understand the proliferation of “factors” that explain the cross-sectional variation of expected stock returns in the asset-pricing literature.

     
    more » « less
  2. Most previous work in unsupervised semantic modeling in the presence of metadata has assumed that our goal is to make latent dimensions more correlated with metadata, but in practice the exact opposite is often true. Some users want topic models that highlight differences between, for example, authors, but others seek more subtle connections across authors. We introduce three metrics for identifying topics that are highly correlated with metadata, and demonstrate that this problem affects between 30 and 50% of the topics in models trained on two real-world collections, regardless of the size of the model. We find that we can predict which words cause this phenomenon and that by selectively subsampling these words we dramatically reduce topic-metadata correlation, improve topic stability, and maintain or even improve model quality 
    more » « less
  3. Ansari, Ali R. (Ed.)
    Null models provide a critical baseline for the evaluation of predictive disease models. Many studies consider only the grand mean null model (i.e. R 2 ) when evaluating the predictive ability of a model, which is insufficient to convey the predictive power of a model. We evaluated ten null models for human cases of West Nile virus (WNV), a zoonotic mosquito-borne disease introduced to the United States in 1999. The Negative Binomial, Historical (i.e. using previous cases to predict future cases) and Always Absent null models were the strongest overall, and the majority of null models significantly outperformed the grand mean. The length of the training timeseries increased the performance of most null models in US counties where WNV cases were frequent, but improvements were similar for most null models, so relative scores remained unchanged. We argue that a combination of null models is needed to evaluate the forecasting performance of predictive models for infectious diseases and the grand mean is the lowest bar. 
    more » « less