Contrastive learning has served as a powerful framework in the early development of vision–language models (VLMs), demonstrating remarkable effectiveness in learning generalizable representations and establishing itself as the foundation for many state-of-the-art systems. However, despite these advances, its theoretical understanding remains limited, particularly under imbalanced data distributions that are prevalent in real-world settings. Such imbalance can degrade representation quality and induce biased model behavior, yet a rigorous characterization of these effects is still lacking. In this work, we develop a theoretical framework to analyze the training dynamics of contrastive learning with Transformer-based encoders under imbalanced data. Our results reveal that neuron weights evolve differently across three stages of training, with distinct dynamics for majority features, minority features, and the noise. We further show that minority features diminish neurons’ representational capacity, increase the need for more complex architectures, and impair the separation of ground-truth features from noise. These findings offer new theoretical insights into how data imbalance shapes learning in contrastive frameworks and serve as an early step towards principled modifications for developing more robust and unbiased representations.
more »
« less
On the Training Dynamics of Contrastive Learning with Imbalanced Feature Distributions: A Theoretical Study of Feature Learning
Contrastive learning has served as a powerful framework in the early development of vision–language models (VLMs), demonstrating remarkable effectiveness in learning generalizable representations and establishing itself as the foundation for many state-of-the-art systems. However, despite these advances, its theoretical understanding remains limited, particularly under imbalanced data distributions that are prevalent in real-world settings. Such imbalance can degrade representation quality and induce biased model behavior, yet a rigorous characterization of these effects is still lacking. In this work, we develop a theoretical framework to analyze the training dynamics of contrastive learning with Transformer-based encoders under imbalanced data. Our results reveal that neuron weights evolve differently across three stages of training, with distinct dynamics for majority features, minority features, and the noise. We further show that minority features diminish neurons’ representational capacity, increase the need for more complex architectures, and impair the separation of ground-truth features from noise. These findings offer new theoretical insights into how data imbalance shapes learning in contrastive frameworks and serve as an early step towards principled modifications for developing more robust and unbiased representations.
more »
« less
- Award ID(s):
- 2349879
- PAR ID:
- 10663179
- Publisher / Repository:
- 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: UniReps: 3rd Edition of the Workshop on Unifying Representations in Neural Models
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Contrastive learning is a powerful framework for learning discriminative representations from image-text pairs. Despite its success, its theoretical foundations, especially when the image-text pair exhibits misalignment, remain underexplored. This paper provides the first theoretical analysis of contrastive learning under data misalignment, proving how the ground-truth modality-paired features are amplified while spurious features are suppressed through the training dynamics analysis. Specifically, we study two nonlinear encoders trained jointly with a contrastive loss and demonstrate that noisy (or misaligned) data pairs result in mixed representations and degrade the model's generalization ability. In contrast, recaptioning and filtering improve the data alignment, which in turn purifies the features learned by neurons and subsequently enhances generalization. Our analysis identifies feature purity as a key factor in the success of contrastive learning and offers insights into how data quality and training procedures impact representation learning and downstream generalization. Theoretical insights are supported by experiments on standard benchmarks.more » « less
-
null (Ed.)In this paper, we propose a discriminative variational autoencoder (DVAE) to assist deep learning from data with imbalanced class distributions. DVAE is designed to alleviate the class imbalance by explicitly learning class boundaries between training samples, and uses learned class boundaries to guide the feature learning and sample generation. To learn class boundaries, DVAE learns a latent two-component mixture distributor, conditioned by the class labels, so the latent features can help differentiate minority class vs. majority class samples. In order to balance the training data for deep learning to emphasize on the minority class, we combine DVAE and generative adversarial networks (GAN) to form a unified model, DVAAN, which generates synthetic instances close to the class boundaries as training data to learn latent features and update the model. Experiments and comparisons confirm that DVAAN significantly alleviates the class imbalance and delivers accurate models for deep learning from imbalanced data.more » « less
-
In many real-world classification applications such as fake news detection, the training data can be extremely imbalanced, which brings challenges to existing classifiers as the majority classes dominate the loss functions of classifiers. Oversampling techniques such as SMOTE are effective approaches to tackle the class imbalance problem by producing more synthetic minority samples. Despite their success, the majority of existing oversampling methods only consider local data distributions when generating minority samples, which can result in noisy minority samples that do not fit global data distributions or interleave with majority classes. Hence, in this paper, we study the class imbalance problem by simultaneously exploring local and global data information since: (i) the local data distribution could give detailed information for generating minority samples; and (ii) the global data distribution could provide guidance to avoid generating outliers or samples that interleave with majority classes. Specifically, we propose a novel framework GL-GAN, which leverages the SMOTE method to explore local distribution in a learned latent space and employs GAN to capture the global information, so that synthetic minority samples can be generated under even extremely imbalanced scenarios. Experimental results on diverse real data sets demonstrate the effectiveness of our GL-GAN framework in producing realistic and discriminative minority samples for improving the classification performance of various classifiers on imbalanced training data. Our code is available at https://github.com/wentao-repo/GL-GAN.more » « less
-
This paper proposes a novel oversampling approach that strives to balance the class priors with a considerably imbalanced data distribution of high dimensionality. The crux of our approach lies in learning interpretable latent representations that can model the synthetic mechanism of the minority samples by using a generative adversarial network(GAN). A Bayesian regularizer is imposed to guide the GAN to extract a set of salient features that are either disentangled or intensionally entangled, with their interplay controlled by a prescribed structure, defined with human-in-the-loop. As such, our GAN enjoys an improved sample complexity, being able to synthesize high-quality minority samples even if the sizes of minority classes are extremely small during training. Empirical studies substantiate that our approach can empower simple classifiers to achieve superior imbalanced classification performance over the state-of-the-art competitors and is robust across various imbalance settings. Code is released in github.com/fudonglin/IMSIC.more » « less
An official website of the United States government

