skip to main content


Title: The Two Dimensions of Worst-case Training and Their Integrated Effect for Out-of-domain Generalization
Training with an emphasis on “hard-to-learn” components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e.g., generalization across distributions) is valued. Existing literature discussing this “hard-to-learn” concept are mainly expanded either along the dimension of the samples or the dimension of the features. In this paper, we aim to introduce a simple view merging these two dimensions, leading to a new, simple yet effective, heuristic to train machine learning models by emphasizing the worst-cases on both the sample and the feature dimensions. We name our method W2D following the concept of “Worst-case along Two Dimensions”. We validate the idea and demonstrate its empirical strength over standard benchmarks.  more » « less
Award ID(s):
2204808 2150012
PAR ID:
10385712
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range / eLocation ID:
9621 to 9631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions. 
    more » « less
  2. null (Ed.)
    Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions. 
    more » « less
  3. General Chair: Marc Moreno Chair: Lihong Zhi (Ed.)
    We introduce a new type of reduction in a free difference module over a difference field that uses a generalization of the concept of effective order of a difference polynomial. Then we define the concept of a generalized characteristic set of such a module, establish some properties of these characteristic sets and use them to prove the existence, outline a method of computation and find invariants of a dimension polynomial in two variables associated with a finitely generated difference module. As a consequence of these results, we obtain a new type of bivariate dimension polynomials of finitely generated difference field extensions. We also explain the relationship between these dimension polynomials and the concept of Einstein’s strength of a system of difference equations. 
    more » « less
  4. We present a class of models of elastic phase transitions with incompatible energy wells in an arbitrary space dimension, where in a hard device an abundance of Lipschitz global minimizers coexists with a complete lack of strong local minimizers. The analysis is based on the proof that every strong local minimizer in a hard device is also a global minimizer which is applicable much beyond the chosen class of models. Along the way we show that a new demonstration of sufficiency for a subclass of affine boundary conditions can be built around a novel nonlinear generalization of the classical Clapeyron theorem. 
    more » « less
  5. Yiming Ying (Ed.)
    Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyz- ing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine learning models. Leveraging the Uniform-LGI, we first derive convergence rates for gradient flow algorithm, then we give generalization bounds for a large class of machine learning models. We further apply our framework to three distinct machine learning models: linear regression, kernel regression, and two-layer neural networks. Through our approach, we obtain generalization estimates that match or extend previous results. 
    more » « less