skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Generalization Effects of Linear Transformations in Data Augmentation
Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that trans-formations which preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations which mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms RandAugment by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.  more » « less
Award ID(s):
1813049 1704417
PAR ID:
10195154
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data augmentation is widely used for training a neural network given little labeled data. A common practice of augmentation training is applying a composition of multiple transformations sequentially to the data. Existing augmentation methods such as RandAugment randomly sample from a list of pre-selected transformations, while methods such as AutoAugment apply advanced search to optimize over an augmentation set of size $k^d$, which is the number of transformation sequences of length $$d$$, given a list of $$k$$ transformations. In this paper, we design efficient algorithms whose running time complexity is much faster than the worst-case complexity of $O(k^d)$, provably. We propose a new algorithm to search for a binary tree-structured composition of $$k$$ transformations, where each tree node corresponds to one transformation. The binary tree generalizes sequential augmentations, such as the SimCLR augmentation scheme for contrastive learning. Using a top-down, recursive search procedure, our algorithm achieves a runtime complexity of $O(2^d k)$, which is much faster than $O(k^d)$ as $$k$$ increases above $$2$$. We apply our algorithm to tackle data distributions with heterogeneous subpopulations by searching for one tree in each subpopulation and then learning a weighted combination, resulting in a \emph{forest} of trees. We validate our proposed algorithms on numerous graph and image datasets, including a multi-label graph classification dataset we collected. The dataset exhibits significant variations in the sizes of graphs and their average degrees, making it ideal for studying data augmentation. We show that our approach can reduce the computation cost by 43% over existing search methods while improving performance by 4.3%. The tree structures can be used to interpret the relative importance of each transformation, such as identifying the important transformations on small vs. large graphs. 
    more » « less
  2. Freezing layers in deep neural networks has been shown to enhance generalization and accelerate training, yet the underlying mechanisms remain unclear. This paper investigates the impact of frozen layers from the perspective of linear separability, examining how untrained, randomly initialized layers influence feature representations and model performance. Using multilayer perceptrons trained on MNIST, CIFAR-10, and CIFAR-100, we systematically analyze the effects freezing layers and network architecture. While prior work attributes the benefits of frozen layers to Cover’s theorem, which suggests that nonlinear transformations improve linear separability, we find that this explanation is insufficient. Instead, our results indicate that the observed improvements in generalization and convergence stem from other mechanisms. We hypothesize that freezing may have similar effects to other regularization techniques and that it may smooth the loss landscape to facilitate training. Furthermore, we identify key architectural factors---such as network overparameterization and use of skip connections---that modulate the effectiveness of frozen layers. These findings offer new insights into the conditions under which freezing layers can optimize deep learning performance, informing future work on neural architecture search. 
    more » « less
  3. Imperfect labels are ubiquitous in real-world datasets. Several recent successful methods for training deep neural networks (DNNs) robust to label noise have used two primary techniques: filtering samples based on loss during a warm-up phase to curate an initial set of cleanly labeled samples, and using the output of a network as a pseudo-label for subsequent loss calculations. In this paper, we evaluate different augmentation strategies for algorithms tackling the "learning with noisy labels" problem. We propose and examine multiple augmentation strategies and evaluate them using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world dataset Clothing1M. Due to several commonalities in these algorithms, we find that using one set of augmentations for loss modeling tasks and another set for learning is the most effective, improving results on the state-of-the-art and other previous methods. Furthermore, we find that applying augmentation during the warm-up period can negatively impact the loss convergence behavior of correctly versus incorrectly labeled samples. We introduce this augmentation strategy to the state-of-the-art technique and demonstrate that we can improve performance across all evaluated noise levels. In particular, we improve accuracy on the CIFAR-10 benchmark at 90% symmetric noise by more than 15% in absolute accuracy, and we also improve performance on the Clothing1M dataset. 
    more » « less
  4. The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem, still lacks an effective solution, particularly for large-scale datasets. In this paper, we propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models such as CLIP and cluster images effectively and efficiently at scale. We first developed a novel algorithm to estimate the number of clusters in a given dataset. We then show that the pre-trained features are significantly more structured by further optimizing the rate reduction objective. The resulting features may significantly improve the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore, by leveraging CLIP's multimodality bridge between image and text, we develop a simple yet effective self-labeling algorithm that produces meaningful captions for the clusters. Through extensive experiments, we show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k. It also extends to datasets that are not curated for clustering, such as LAION-Aesthetics and WikiArts. We released the code in https://github.com/LeslieTrue/CPP 
    more » « less
  5. Model confidence or uncertainty is critical in autonomous systems as they directly tie to the safety and trustworthiness of the system. The quantification of uncertainty in the output decisions of deep neural networks (DNNs) is a challenging problem. The Bayesian framework enables the estimation of the predictive uncertainty by introducing probability distributions over the (unknown) network weights; however, the propagation of these high-dimensional distributions through multiple layers and non-linear transformations is mathematically intractable. In this work, we propose an extended variational inference (eVI) framework for convolutional neural network (CNN) based on tensor Normal distributions (TNDs) defined over convolutional kernels. Our proposed eVI framework propagates the first two moments (mean and covariance) of these TNDs through all layers of the CNN. We employ first-order Taylor series linearization to approximate the mean and covariances passing through the non-linear activations. The uncertainty in the output decision is given by the propagated covariance of the predictive distribution. Furthermore, we show, through extensive simulations on the MNIST and CIFAR-10 datasets, that the CNN becomes more robust to Gaussian noise and adversarial attacks. 
    more » « less