skip to main content


Title: Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning
We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov-Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.  more » « less
Award ID(s):
1922658
NSF-PAR ID:
10342927
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches. 
    more » « less
  2. Abstract

    State-of-the-art quantum machine learning (QML) algorithms fail to offer practical advantages over their notoriously powerful classical counterparts, due to the limited learning capabilities of QML algorithms, the constrained computational resources available on today’s noisy intermediate-scale quantum (NISQ) devices, and the empirically designed circuit ansatz for QML models. In this work, we address these challenges by proposing a hybrid quantum–classical neural network (CaNN), which we call QCLIP, for Quantum Contrastive Language-Image Pre-Training. Rather than training a supervised QML model to predict human annotations, QCLIP focuses on more practical transferable visual representation learning, where the developed model can be generalized to work on unseen downstream datasets. QCLIP is implemented by using CaNNs to generate low-dimensional data feature embeddings followed by quantum neural networks to adapt and generalize the learned representation in the quantum Hilbert space. Experimental results show that the hybrid QCLIP model can be efficiently trained for representation learning. We evaluate the representation transfer capability of QCLIP against the classical Contrastive Language-Image Pre-Training model on various datasets. Simulation results and real-device results on NISQIBM_Aucklandquantum computer both show that the proposed QCLIP model outperforms the classical CLIP model in all test cases. As the field of QML on NISQ devices is continually evolving, we anticipate that this work will serve as a valuable foundation for future research and advancements in this promising area.

     
    more » « less
  3. Despite the substantial success of deep learning for modulation classification, models trained on a specific transmitter configuration and channel model often fail to generalize well to other scenarios with different transmitter configurations, wireless fading channels, or receiver impairments such as clock offset. This paper proposes Contrastive Learning with Self- Reconstruction called CLSR-AMC to learn good representations of signals resilient to channel changes. While contrastive loss focuses on the differences between individual modulations, the reconstruction loss captures representative features of the signal. Additionally, we develop three data augmentation operators to emulate the impact of channel and hardware impairments without exhaustive modeling of different channel profiles. We perform extensive experimentation with commonly used datasets. We show that CLSR-AMC outperforms its counterpart based on contrastive learning for the same amount of labeled data by significant average accuracy gains of 24.29%, 17.01%, and 15.97% in Additive White Gaussian Noise (AWGN), Rayleigh+AWGN, and Rician+AWGN channels, respectively. 
    more » « less
  4. Abstract

    Blind source separation (BSS) is commonly used in functional magnetic resonance imaging (fMRI) data analysis. Recently, BSS models based on restricted Boltzmann machine (RBM), one of the building blocks of deep learning models, have been shown to improve brain network identification compared to conventional single matrix factorization models such as independent component analysis (ICA). These models, however, trained RBM on fMRI volumes, and are hence challenged by model complexity and limited training set. In this article, we propose to apply RBM to fMRI time courses instead of volumes for BSS. The proposed method not only interprets fMRI time courses explicitly to take advantages of deep learning models in latent feature learning but also substantially reduces model complexity and increases the scale of training set to improve training efficiency. Our experimental results based on Human Connectome Project (HCP) datasets demonstrated the superiority of the proposed method over ICA and the one that applied RBM to fMRI volumes in identifying task‐related components, resulted in more accurate and specific representations of task‐related activations. Moreover, our method separated out components representing intermixed effects between task events, which could reflect inherent interactions among functionally connected brain regions. Our study demonstrates the value of RBM in mining complex structures embedded in large‐scale fMRI data and its potential as a building block for deeper models in fMRI data analysis.

     
    more » « less
  5. Contrastive learning (CL) has been widely investigated with various learning mech- anisms and achieves strong capability in learning representations of data in a self-supervised manner using unlabeled data. A common fashion of contrastive learning on this line is employing large-sized encoders to achieve comparable performance as the supervised learning counterpart. Despite the success of the labelless training, current contrastive learning algorithms failed to achieve good performance with lightweight (compact) models, e.g., MobileNet, while the re- quirements of the heavy encoders impede the energy-efficient computation, espe- cially for resource-constrained AI applications. Motivated by this, we propose a new self-supervised CL scheme, named SACL-XD, consisting of two technical components, Slimmed Asymmetrical Contrastive Learning (SACL) and Cross- Distillation (XD), which collectively enable efficient CL with compact models. While relevant prior works employed a strong pre-trained model as the teacher of unsupervised knowledge distillation to a lightweight encoder, our proposed method trains CL models from scratch and outperforms them even without such an expensive requirement. Compared to the SoTA lightweight CL training (dis- tillation) algorithms, SACL-XD achieves 1.79% ImageNet-1K accuracy improve- ment on MobileNet-V3 with 64⇥ training FLOPs reduction. Code is available at https://github.com/mengjian0502/SACL-XD. 
    more » « less