skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Two Dimensions of Worst-case Training and Their Integrated Effect for Out-of-domain Generalization
Training with an emphasis on “hard-to-learn” components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e.g., generalization across distributions) is valued. Existing literature discussing this “hard-to-learn” concept are mainly expanded either along the dimension of the samples or the dimension of the features. In this paper, we aim to introduce a simple view merging these two dimensions, leading to a new, simple yet effective, heuristic to train machine learning models by emphasizing the worst-cases on both the sample and the feature dimensions. We name our method W2D following the concept of “Worst-case along Two Dimensions”. We validate the idea and demonstrate its empirical strength over standard benchmarks.  more » « less
Award ID(s):
2204808 2150012
PAR ID:
10385712
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range / eLocation ID:
9621 to 9631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions. 
    more » « less
  2. null (Ed.)
    Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions. 
    more » « less
  3. General Chair: Marc Moreno Chair: Lihong Zhi (Ed.)
    We introduce a new type of reduction in a free difference module over a difference field that uses a generalization of the concept of effective order of a difference polynomial. Then we define the concept of a generalized characteristic set of such a module, establish some properties of these characteristic sets and use them to prove the existence, outline a method of computation and find invariants of a dimension polynomial in two variables associated with a finitely generated difference module. As a consequence of these results, we obtain a new type of bivariate dimension polynomials of finitely generated difference field extensions. We also explain the relationship between these dimension polynomials and the concept of Einstein’s strength of a system of difference equations. 
    more » « less
  4. We present a class of models of elastic phase transitions with incompatible energy wells in an arbitrary space dimension, where in a hard device an abundance of Lipschitz global minimizers coexists with a complete lack of strong local minimizers. The analysis is based on the proof that every strong local minimizer in a hard device is also a global minimizer which is applicable much beyond the chosen class of models. Along the way we show that a new demonstration of sufficiency for a subclass of affine boundary conditions can be built around a novel nonlinear generalization of the classical Clapeyron theorem. 
    more » « less
  5. null (Ed.)
    We study overparameterization in generative adversarial networks (GANs) that can interpolate the training data. We show that overparameterization can improve generalization performance and accelerate the training process. We study the generalization error as a function of latent space dimension and identify two main behaviors, depending on the learning setting. First, we show that overparameterized generative models that learn distributions by minimizing a metric or f-divergence do not exhibit double descent in generalization errors; specifically, all the interpolating solutions achieve the same generalization error. Second, we develop a new pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. Our pseudo-supervised setting exhibits double descent (and in some cases, triple descent) of generalization errors. We combine pseudo-supervision with overparameterization (i.e., overly large latent space dimension) to accelerate training while performing better, or close to, the generalization performance without pseudo-supervision. While our analysis focuses mostly on linear GANs, we also apply important insights for improving generalization of nonlinear, multilayer GANs. 
    more » « less