Explaining Deep Neural Network Models with Adversarial Gradient Integration

Pan, Deng; Li, Xin; Zhu, Dongxiao

doi:10.24963/ijcai.2021/396

Citation Details

Explaining Deep Neural Network Models with Adversarial Gradient Integration

Deep neural networks (DNNs) have became one of the most high performing tools in a broad rangeof machine learning areas. However, the multilayer non-linearity of the network architectures preventus from gaining a better understanding of the models’ predictions. Gradient based attributionmethods (e.g., Integrated Gradient (IG)) that decipher input features’ contribution to the predictiontask have been shown to be highly effective yet requiring a reference input as the anchor for explainingmodel’s output. The performance of DNN model interpretation can be quite inconsistent withregard to the choice of references. Here we propose an Adversarial Gradient Integration (AGI) methodthat integrates the gradients from adversarial examples to the target example along the curve of steepestascent to calculate the resulting contributions from all input features. Our method doesn’t rely onthe choice of references, hence can avoid the ambiguity and inconsistency sourced from the referenceselection. We demonstrate the performance of our AGI method and compare with competing methodsin explaining image classification results. Code is available from https://github.com/pd90506/AGI.

Award ID(s):: 1724227

NSF-PAR ID:: 10288214

Author(s) / Creator(s):: Pan, Deng; Li, Xin; Zhu, Dongxiao

Date Published:: 2021-08-01

Journal Name:: Thirtieth International Joint Conference on Artificial Intelligence (IJCAI)

Page Range / eLocation ID:: 2876 to 2883

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.24963/ijcai.2021/396

More Like this