Remove Model Backdoors via Importance Driven Cloning

Qiuling Xu; Guanhong Tao; Jean Honorio; Yingqi Liu; Shengwei An; Guangyu Shen; Siyuan Cheng; Xiangyu Zhang

Citation Details

We develop a novel method to remove injected backdoors in deep learning models. It works by cloning the benign behaviors of a trojaned model to a new model of the same structure. It trains the clone model from scratch on a very small subset of samples and aims to minimize a cloning loss that denotes the differences between the activations of important neurons across the two models. The set of important neurons varies for each input, depending on their magnitude of activations and their impact on the classification result. We theoretically show our method can better recover benign functions of the backdoor model. Meanwhile, we prove our method can be more effective in removing backdoors compared with fine-tuning. Our experiments show that our technique can effectively remove nine different types of backdoors with minor benign accuracy degradation, outperforming the state-of-the-art backdoor removal techniques that are based on fine-tuning, knowledge distillation, and neuron pruning. more »

Award ID(s):: 2134209

PAR ID:: 10419658

Author(s) / Creator(s):: Qiuling Xu; Guanhong Tao; Jean Honorio; Yingqi Liu; Shengwei An; Guangyu Shen; Siyuan Cheng; Xiangyu Zhang

Date Published:: 2023-06-01

Journal Name:: IEEE Conference on Computer Vision and Pattern Recognition

ISSN:: 2163-6648

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this