TMI! Finetuned Models Leak Private Information from their Pretraining Data

Abascal, John; Wu, Stanley; Oprea, Alina; Ullman, Jonathan

doi:10.56553/popets-2024-0075

Citation Details

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for privacy in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, TMI, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate TMI on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that TMI can successfully infer membership of pretraining examples using query access to the finetuned model. more »

Award ID(s):: 2247484

PAR ID:: 10539940

Author(s) / Creator(s):: Abascal, John; Wu, Stanley; Oprea, Alina; Ullman, Jonathan

Publisher / Repository:: PETS

Date Published:: 2024-07-01

Journal Name:: Proceedings on Privacy Enhancing Technologies

Volume:: 2024

Issue:: 3

ISSN:: 2299-0984

Page Range / eLocation ID:: 202 to 223

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.56553/popets-2024-0075

More Like this