Attacking Bayes: Are Bayesian Neural Networks Inherently Robust?

Feng, Yunzhen; Rudner, Tim G; Tsilivis, Nikolaos; Kempe, Julia

Citation Details

Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that {\em Bayesian} neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are {\em not} inherently robust against adversarial attacks. more »

Award ID(s):: 1922658

PAR ID:: 10534739

Author(s) / Creator(s):: Feng, Yunzhen; Rudner, Tim G; Tsilivis, Nikolaos; Kempe, Julia

Publisher / Repository:: Transactions on Machine Learning Research

Date Published:: 2024-07-01

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this