Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy

Citation Details

When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer—the “head”). It is well known that fine-tuning leads to better accuracy in-distribution (ID). However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large. On 10 distribution shift datasets (Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR → STL, CIFAR10.1, FMoW, ImageNetV2, ImageNet-R, ImageNet-A, ImageNet-Sketch), fine-tuning obtains on average 2% higher accuracy ID but 7% lower accuracy OOD than linear probing. We show theoretically that this tradeoff between ID and OOD accuracy arises even in a simple setting: fine-tuning overparameterized two-layer linear networks. We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head—this is because while fine-tuning learns the head, the lower layers of the neural network change simultaneously and distort the pretrained features. Our analysis suggests that the easy two-step strategy of linear probing then full fine-tuning (LP-FT), sometimes used as a fine-tuning heuristic, combines the benefits of both fine-tuning and linear probing. Empirically, LP-FT outperforms both fine-tuning and linear probing on the above datasets (1% better ID, 10% better OOD than full fine-tuning). more »

Award ID(s):: 2343611

PAR ID:: 10472125

Author(s) / Creator(s):: Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy

Publisher / Repository:: ICLR (Oral) 2022 arXiv:2202.10054

Date Published:: 2022-02-21

Subject(s) / Keyword(s):: Machine Learning (cs.LG) Computer Vision and Pattern Recognition (cs.CV)

Format(s):: Medium: X

Location:: ICLR (Oral) 2022 arXiv:2202.10054

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this