skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 2:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

This content will become publicly available on December 10, 2024

Title: Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training
Contrastive learning (CL) has been widely investigated with various learning mech- anisms and achieves strong capability in learning representations of data in a self-supervised manner using unlabeled data. A common fashion of contrastive learning on this line is employing large-sized encoders to achieve comparable performance as the supervised learning counterpart. Despite the success of the labelless training, current contrastive learning algorithms failed to achieve good performance with lightweight (compact) models, e.g., MobileNet, while the re- quirements of the heavy encoders impede the energy-efficient computation, espe- cially for resource-constrained AI applications. Motivated by this, we propose a new self-supervised CL scheme, named SACL-XD, consisting of two technical components, Slimmed Asymmetrical Contrastive Learning (SACL) and Cross- Distillation (XD), which collectively enable efficient CL with compact models. While relevant prior works employed a strong pre-trained model as the teacher of unsupervised knowledge distillation to a lightweight encoder, our proposed method trains CL models from scratch and outperforms them even without such an expensive requirement. Compared to the SoTA lightweight CL training (dis- tillation) algorithms, SACL-XD achieves 1.79% ImageNet-1K accuracy improve- ment on MobileNet-V3 with 64⇥ training FLOPs reduction. Code is available at  more » « less
Award ID(s):
2144751 2314591 2328803 2342726
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
NeurIPS 2023
Date Published:
Journal Name:
Thirty-seventh Conference on Neural Information Processing Systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Collaborative learning enables distributed clients to learn a shared model for prediction while keeping the training data local on each client. However, existing collaborative learning methods require fully-labeled data for training, which is inconvenient or sometimes infeasible to obtain due to the high labeling cost and the requirement of expertise. The lack of labels makes collaborative learning impractical in many realistic settings. Self-supervised learning can address this challenge by learning from unlabeled data. Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled image data. However, the distributed data collected on clients are usually not independent and identically distributed (non-IID) among clients, and each client may only have few classes of data, which degrades the performance of CL and learned representations. To tackle this problem, we propose a collaborative contrastive learning framework consisting of two approaches: feature fusion and neighborhood matching, by which a unified feature space among clients is learned for better data representations. Feature fusion provides remote features as accurate contrastive information to each client for better local learning. Neighborhood matching further aligns each client’s local features to the remote features such that well-clustered features among clients can be learned. Extensive experiments show the effectiveness of the proposed framework. It outperforms other methods by 11% on IID data and matches the performance of centralized learning.

    more » « less
  2. Avidan, S. (Ed.)
    Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult. Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hierarchical spatial-temporal nature of human skeletons. Different from such superficial supervision at the video level, we propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder (Hi-TRS), to explicitly capture spatial, short-term, and long-term temporal dependencies at frame, clip, and video levels, respectively. To evaluate the proposed self-supervised pre-training scheme with Hi-TRS, we conduct extensive experiments covering three skeleton-based downstream tasks including action recognition, action detection, and motion prediction. Under both supervised and semi-supervised evaluation protocols, our method achieves the state-of-the-art performance. Additionally, we demonstrate that the prior knowledge learned by our model in the pre-training stage has strong transfer capability for different downstream tasks. 
    more » « less
  3. Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled data. Given the CL training data, generative models can be trained to generate synthetic data to supplement the real data. Using both synthetic and real data for CL training has the potential to improve the quality of learned representations. However, synthetic data usually has lower quality than real data, and using synthetic data may not improve CL compared with using real data. To tackle this problem, we propose a data generation framework with two methods to improve CL training by joint sample generation and contrastive learning. The first approach generates hard samples for the main model. The generator is jointly learned with the main model to dynamically customize hard samples based on the training state of the main model. Besides, a pair of data generators are proposed to generate similar but distinct samples as positive pairs. In joint learning, the hardness of a positive pair is progressively increased by decreasing their similarity. Experimental results on multiple datasets show superior accuracy and data efficiency of the proposed data generation methods applied to CL. For example, about 4.0%, 3.5%, and 2.6% accuracy improvements for linear classification are observed on ImageNet-100, CIFAR-100, and CIFAR-10, respectively. Besides, up to 2× data efficiency for linear classification and up to 5× data efficiency for transfer learning are achieved. 
    more » « less
  4. Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection. While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. On the one hand, human developers tend to write repetitive programs referencing existing code snippets from the current codebase or online resources (e.g., Stack Overflow website) rather than implementing functions from scratch; such behaviors result in a vast number of code clones. In contrast, a deviant clone by mistake might trigger malicious program behaviors. Thus, as a proxy to incorporate developers' coding behavior into the pre-training scheme, we propose to include code clones and their deviants. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart. We show that CONCORD's clone-aware contrastive learning drastically reduces the need for expensive pre-training resources while improving the performance of downstream SE tasks. We also empirically demonstrate that CONCORD can improve existing pre-trained models to learn better representations that consequently become more efficient in both identifying semantically equivalent programs and differentiating buggy from non-buggy code. 
    more » « less
  5. Contrastive learning is a self-supervised representation learning method that achieves milestone performance in various classification tasks. However, due to its unsupervised fashion, it suffers from the false negative sample problem: randomly drawn negative samples that are assumed to have a different label but actually have the same label as the anchor. This deteriorates the performance of contrastive learning as it contradicts the motivation of contrasting semantically similar and dissimilar pairs. This raised the attention and the importance of finding legitimate negative samples, which should be addressed by distinguishing between 1) true vs. false negatives; 2) easy vs. hard negatives. However, previous works were limited to the statistical approach to handle false negative and hard negative samples with hyperparameters tuning. In this paper, we go beyond the statistical approach and explore the connection between hard negative samples and data bias. We introduce a novel debiased contrastive learning method to explore hard negatives by relative difficulty referencing the bias-amplifying counterpart. We propose triplet loss for training a biased encoder that focuses more on easy negative samples. We theoretically show that the triplet loss amplifies the bias in self-supervised representation learning. Finally, we empirically show the proposed method improves downstream classification performance. 
    more » « less