skip to main content


Title: Subspace Fitting Meets Regression: The Effects of Supervision and Orthonormality Constraints on Double Descent of Generalization Errors
We study the linear subspace fitting problem in the overparameterized setting, where the estimated subspace can perfectly interpolate the training examples. Our scope includes the least-squares solutions to subspace fitting tasks with varying levels of supervision in the training data (i.e., the proportion of input-output examples of the desired low-dimensional mapping) and orthonormality of the vectors defining the learned operator. This flexible family of problems connects standard, unsupervised subspace fitting that enforces strict orthonormality with a corresponding regression task that is fully supervised and does not constrain the linear operator structure. This class of problems is defined over a supervision-orthonormality plane, where each coordinate induces a problem instance with a unique pair of supervision level and softness of orthonormality constraints. We explore this plane and show that the generalization errors of the corresponding subspace fitting problems follow double descent trends as the settings become more supervised and less orthonormally constrained.  more » « less
Award ID(s):
1911094 1838177 1730574
NSF-PAR ID:
10205627
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm. Our code is publicly available from github. 
    more » « less
  2. Many real-world applications require automated data annotation, such as identifying tissue origins based on gene expressions and classifying images into semantic categories. Annotation classes are often numerous and subject to changes over time, and annotating examples has become the major bottleneck for supervised learning methods. In science and other high-value domains, large repositories of data samples are often available, together with two sources of organic supervision: a lexicon for the annotation classes, and text descriptions that accompany some data samples. Distant supervision has emerged as a promising paradigm for exploiting such indirect supervision by automatically annotating examples where the text description contains a class mention in the lexicon. However, due to linguistic variations and ambiguities, such training data is inherently noisy, which limits the accuracy in this approach. In this paper, we introduce an auxiliary natural language processing system for the text modality, and incorporate co-training to reduce noise and augment signal in distant supervision. Without using any manually labeled data, our EZLearn system learned to accurately annotate data samples in functional genomics and scientific figure comprehension, substantially outperforming state-of-the-art supervised methods trained on tens of thousands of annotated examples. 
    more » « less
  3. Few-shot node classification is tasked to provide accurate predictions for nodes from novel classes with only few representative labeled nodes. This problem has drawn tremendous attention for its projection to prevailing real-world applications, such as product categorization for newly added commodity categories on an E-commerce platform with scarce records or diagnoses for rare diseases on a patient similarity graph. To tackle such challenging label scarcity issues in the non-Euclidean graph domain, meta-learning has become a successful and predominant paradigm. More recently, inspired by the development of graph self-supervised learning, transferring pretrained node embeddings for few-shot node classification could be a promising alternative to meta-learning but remains unexposed. In this work, we empirically demonstrate the potential of an alternative framework, \textit{Transductive Linear Probing}, that transfers pretrained node embeddings, which are learned from graph contrastive learning methods. We further extend the setting of few-shot node classification from standard fully supervised to a more realistic self-supervised setting, where meta-learning methods cannot be easily deployed due to the shortage of supervision from training classes. Surprisingly, even without any ground-truth labels, transductive linear probing with self-supervised graph contrastive pretraining can outperform the state-of-the-art fully supervised meta-learning based methods under the same protocol. We hope this work can shed new light on few-shot node classification problems and foster future research on learning from scarcely labeled instances on graphs. 
    more » « less
  4. Traditional linear subspace-based reduced order models (LS-ROMs) can be used to significantly accelerate simulations in which the solution space of the discretized system has a small dimension (with a fast decaying Kolmogorov n-width). However, LS-ROMs struggle to achieve speed-ups in problems whose solution space has a large dimension, such as highly nonlinear problems whose solutions have large gradients. Such an issue can be alleviated by combining nonlinear model reduction with operator learning. Over the past decade, many nonlinear manifold-based reduced order models (NM-ROM) have been proposed. In particular, NM-ROMs based on deep neural networks (DNN) have received increasing interest. This work takes inspiration from adaptive basis methods and specifically focuses on developing an NM-ROM based on Convolutional Neural Network-based autoencoders (CNNAE) with iteration-dependent trainable kernels. Additionally, we investigate DNN-based and quadratic operator inference strategies between latent spaces. A strategy to perform vectorized implicit time integration is also proposed. We demonstrate that the proposed CNN-based NM-ROM, combined with DNN- based operator inference, generally performs better than commonly employed strategies (in terms of prediction accuracy) on a benchmark advection-dominated problem. The method also presents substantial gain in terms of training speed per epoch, with a training time about one order of magnitude smaller than the one associated with a state-of-the-art technique performing with the same level of accuracy. 
    more » « less
  5. null (Ed.)
    Koopman decomposition is a nonlinear generalization of eigen-decomposition, and is being increasingly utilized in the analysis of spatio-temporal dynamics. Well-known techniques such as the dynamic mode decomposition (DMD) and its linear variants provide approximations to the Koopman operator, and have been applied extensively in many fluid dynamic problems. Despite being endowed with a richer dictionary of nonlinear observables, nonlinear variants of the DMD, such as extended/kernel dynamic mode decomposition (EDMD/KDMD) are seldom applied to large-scale problems primarily due to the difficulty of discerning the Koopman-invariant subspace from thousands of resulting Koopman eigenmodes. To address this issue, we propose a framework based on a multi-task feature learning to extract the most informative Koopman-invariant subspace by removing redundant and spurious Koopman triplets. In particular, we develop a pruning procedure that penalizes departure from linear evolution. These algorithms can be viewed as sparsity-promoting extensions of EDMD/KDMD. Furthermore, we extend KDMD to a continuous-time setting and show a relationship between the present algorithm, sparsity-promoting DMD and an empirical criterion from the viewpoint of non-convex optimization. The effectiveness of our algorithm is demonstrated on examples ranging from simple dynamical systems to two-dimensional cylinder wake flows at different Reynolds numbers and a three-dimensional turbulent ship-airwake flow. The latter two problems are designed such that very strong nonlinear transients are present, thus requiring an accurate approximation of the Koopman operator. Underlying physical mechanisms are analysed, with an emphasis on characterizing transient dynamics. The results are compared with existing theoretical expositions and numerical approximations. 
    more » « less