BBCaL: Black-box Backdoor Detection under the Causality Lens

Hu, Mengxuan; Guan, Zihan; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Li, Sheng

Citation Details

This content will become publicly available on December 24, 2025

BBCaL: Black-box Backdoor Detection under the Causality Lens

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, where attackers can inject hidden backdoors during the training stage. This poses a serious threat to the Model-as-a-Service setting, where downstream users directly utilize third-party models (e.g., HuggingFace Hub, ChatGPT). To this end, we study the inference-stage black-box backdoor detection problem in the paper, where defenders aim to build a firewall to filter out the backdoor inputs in the inference stage, with only input samples and prediction labels available. Existing investigations on this problem either rely on strong assumptions on types of triggers and attacks or suffer from poor efficiency. To build a more generalized and efficient method, we first provide a novel causality-based lens to analyze heterogeneous prediction behaviors for clean and backdoored samples in the inference stage, considering both sample-specific and sample-agnostic backdoor attacks. Motivated by the causal analysis and do-calculus in causal inference, we introduce Black-box Backdoor detection under the Causality Lens (BBCaL) which distinguishes backdoor and clean samples by analyzing prediction consistency after progressively constructing counterfactual samples. Theoretical analysis also sheds light on the effectiveness of the BBCaL. Extensive experiments on three benchmark datasets validate the effectiveness and efficiency of our method. more »

Award ID(s):: 2330215 2316306

PAR ID:: 10615525

Author(s) / Creator(s):: Hu, Mengxuan; Guan, Zihan; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Li, Sheng

Publisher / Repository:: TMLR

Date Published:: 2024-12-24

Journal Name:: Transactions on Machine Learning Research

ISSN:: 2835-8856

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on December 24, 2025
Journal Article:
The DOI is not currently available.

More Like this