skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transfer Learning Pre-training Dataset Effect Analysis for Breast Cancer Imaging
Comparing with natural imaging datasets used in transfer learning, the effects of med- ical pre-training datasets are underexplored. In this study, we carry out transfer learning pre-training dataset effect analysis in breast cancer imaging by evaluating three popular deep neural networks and one patch-based convolutional neural network on three target datasets under different fine-tuning configurations. Through a series of comparisons, we conclude that the pre-training dataset, DDSM, is effective on two other mammogram datasets. However, it is ineffective on an ultrasound dataset. What is more, fine-tuning may mask the inefficacy of a pre-training dataset. In addition, the efficacy/inefficacy of DDSM on the target datasets is corroborated by a representational analysis. At last, we show that hybrid transfer learning cannot mitigate the masking effect of fine-tuning.  more » « less
Award ID(s):
1946202
PAR ID:
10341853
Author(s) / Creator(s):
;
Date Published:
Journal Name:
EPiC Series in Computing
Volume:
83
ISSN:
2398-7340
Page Range / eLocation ID:
108 to 99
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There is an increasing number of pre-trained deep neural network models. However, it is still unclear how to effectively use these models for a new task. Transfer learning, which aims to transfer knowledge from source tasks to a target task, is an effective solution to this problem. Fine-tuning is a popular transfer learning technique for deep neural networks where a few rounds of training are applied to the parameters of a pre-trained model to adapt them to a new task. Despite its popularity, in this paper we show that fine-tuning suffers from several drawbacks. We propose an adaptive fine-tuning approach, called AdaFilter, which selects only a part of the convolutional filters in the pre-trained model to optimize on a per-example basis. We use a recurrent gated network to selectively fine-tune convolutional filters based on the activations of the previous layer. We experiment with 7 public image classification datasets and the results show that AdaFilter can reduce the average classification error of the standard fine-tuning by 2.54%. 
    more » « less
  2. A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift. 
    more » « less
  3. Deep learning algorithms have been moderately successful in diagnoses of diseases by analyzing medical images especially through neuroimaging that is rich in annotated data. Transfer learning methods have demonstrated strong performance in tackling annotated data. It utilizes and transfers knowledge learned from a source domain to target domain even when the dataset is small. There are multiple approaches to transfer learning that result in a range of performance estimates in diagnosis, detection, and classification of clinical problems. Therefore, in this paper, we reviewed transfer learning approaches, their design attributes, and their applications to neuroimaging problems. We reviewed two main literature databases and included the most relevant studies using predefined inclusion criteria. Among 50 reviewed studies, more than half of them are on transfer learning for Alzheimer's disease. Brain mapping and brain tumor detection were second and third most discussed research problems, respectively. The most common source dataset for transfer learning was ImageNet, which is not a neuroimaging dataset. This suggests that the majority of studies preferred pre-trained models instead of training their own model on a neuroimaging dataset. Although, about one third of studies designed their own architecture, most studies used existing Convolutional Neural Network architectures. Magnetic Resonance Imaging was the most common imaging modality. In almost all studies, transfer learning contributed to better performance in diagnosis, classification, segmentation of different neuroimaging diseases and problems, than methods without transfer learning. Among different transfer learning approaches, fine-tuning all convolutional and fully-connected layers approach and freezing convolutional layers and fine-tuning fully-connected layers approach demonstrated superior performance in terms of accuracy. These recent transfer learning approaches not only show great performance but also require less computational resources and time. 
    more » « less
  4. Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms. 
    more » « less
  5. Deep neural networks have become increasingly popular in radar micro-Doppler classification; yet, a key challenge, which has limited potential gains, is the lack of large amounts of measured data that can facilitate the design of deeper networks with greater robustness and performance. Several approaches have been proposed in the literature to address this problem, such as unsupervised pre-training and transfer learning from optical imagery or synthetic RF data. This work investigates an alternative approach to training which involves exploitation of “datasets of opportunity” – micro-Doppler datasets collected using other RF sensors, which may be of a different frequency, bandwidth or waveform - for the purposes of training. Specifically, this work compares in detail the cross-frequency training degradation incurred for several different training approaches and deep neural network (DNN) architectures. Results show a 70% drop in classification accuracy when the RF sensors for pre-training, fine-tuning, and testing are different, and a 15% degradation when only the pre-training data is different, but the fine-tuning and test data are from the same sensor. By using generative adversarial networks (GANs), a large amount of synthetic data is generated for pre-training. Results show that cross-frequency performance degradation is reduced by 50% when kinematically-sifted GAN-synthesized signatures are used in pre-training. 
    more » « less