Fooling Network Interpretation in Image Classification

Akshayvarun Subramanya, Vipin Pillai

Citation Details

Deep neural networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches are highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the prediction. Moreover, we introduce our attack as a controlled setting to measure the accuracy of interpretation algorithms. We show this using extensive experiments for Grad-CAM interpretation that transfers to occluding patch interpretation as well. We believe our algorithms can facilitate developing more robust network interpretation tools that truly explain the network’s underlying decision making process. more »

Award ID(s):: 1845216 2230693

PAR ID:: 10188567

Author(s) / Creator(s):: Akshayvarun Subramanya, Vipin Pillai

Date Published:: 2019-01-01

Journal Name:: International Conference on Computer Vision (ICCV) 2019

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this