skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Explaining Deep Neural Network Models with Adversarial Gradient Integration
Deep neural networks (DNNs) have became one of the most high performing tools in a broad rangeof machine learning areas. However, the multilayer non-linearity of the network architectures preventus from gaining a better understanding of the models’ predictions. Gradient based attributionmethods (e.g., Integrated Gradient (IG)) that decipher input features’ contribution to the predictiontask have been shown to be highly effective yet requiring a reference input as the anchor for explainingmodel’s output. The performance of DNN model interpretation can be quite inconsistent withregard to the choice of references. Here we propose an Adversarial Gradient Integration (AGI) methodthat integrates the gradients from adversarial examples to the target example along the curve of steepestascent to calculate the resulting contributions from all input features. Our method doesn’t rely onthe choice of references, hence can avoid the ambiguity and inconsistency sourced from the referenceselection. We demonstrate the performance of our AGI method and compare with competing methodsin explaining image classification results. Code is available from https://github.com/pd90506/AGI.  more » « less
Award ID(s):
1724227
PAR ID:
10288214
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Thirtieth International Joint Conference on Artificial Intelligence (IJCAI)
Page Range / eLocation ID:
2876 to 2883
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern image classification systems are often built on deep neural networks, which suffer from adversarial examples—images with deliberately crafted, imperceptible noise to mislead the network’s classification. To defend against adversarial examples, a plausible idea is to obfuscate the network’s gradient with respect to the input image. This general idea has inspired a long line of defense methods. Yet, almost all of them have proven vulnerable. We revisit this seemingly flawed idea from a radically different perspective. We embrace the omnipresence of adversarial examples and the numerical procedure of crafting them, and turn this harmful attacking process into a useful defense mechanism. Our defense method is conceptually simple: before feeding an input image for classification, transform it by finding an adversarial example on a pre- trained external model. We evaluate our method against a wide range of possible attacks. On both CIFAR-10 and Tiny ImageNet datasets, our method is significantly more robust than state-of-the-art methods. Particularly, in comparison to adversarial training, our method offers lower training cost as well as stronger robustness. 
    more » « less
  2. Deep learning (DL) models have demonstrated state-of-the-art performance in the classification of diagnostic imaging in oncology. However, DL models for medical images can be compromised by adversarial images, where pixel values of input images are manipulated to deceive the DL model. To address this limitation, our study investigates the detectability of adversarial images in oncology using multiple detection schemes. Experiments were conducted on thoracic computed tomography (CT) scans, mammography, and brain magnetic resonance imaging (MRI). For each dataset we trained a convolutional neural network to classify the presence or absence of malignancy. We trained five DL and machine learning (ML)-based detection models and tested their performance in detecting adversarial images. Adversarial images generated using projected gradient descent (PGD) with a perturbation size of 0.004 were detected by the ResNet detection model with an accuracy of 100% for CT, 100% for mammogram, and 90.0% for MRI. Overall, adversarial images were detected with high accuracy in settings where adversarial perturbation was above set thresholds. Adversarial detection should be considered alongside adversarial training as a defense technique to protect DL models for cancer imaging classification from the threat of adversarial images. 
    more » « less
  3. null (Ed.)
    As deep neural networks (DNNs) achieve extraordi- nary performance in a wide range of tasks, testing their robust- ness under adversarial attacks becomes paramount. Adversarial attacks, also known as adversarial examples, are used to measure the robustness of DNNs and are generated by incorporating imperceptible perturbations into the input data with the intention of altering a DNN’s classification. In prior work in this area, most of the proposed optimization based methods employ gradient descent to find adversarial examples. In this paper, we present an innovative method which generates adversarial examples via convex programming. Our experiment results demonstrate that we can generate adversarial examples with lower distortion and higher transferability than the C&W attack, which is the current state-of-the-art adversarial attack method for DNNs. We achieve 100% attack success rate on both the original undefended models and the adversarially-trained models. Our distortions of the L∞ attack are respectively 31% and 18% lower than the C&W attack for the best case and average case on the CIFAR-10 data set. 
    more » « less
  4. We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks. 
    more » « less
  5. Neural networks (NNs) enable precise modeling of complicated geophysical phenomena but can be sensitive to small input changes. In this work, we present a new method for analyzing this instability in NNs. We focus our analysis on adversarial examples, test‐time inputs with carefully crafted human‐imperceptible perturbations that expose the worst‐case instability in a model's predictions. Our stability analysis is based on a low‐rank expansion of NNs on a fixed input, and we apply our analysis to a NN model for tsunami early warning which takes geodetic measurements as the input and forecasts tsunami waveforms. The result is an improved description of local stability that explains adversarial examples generated by a standard gradient‐based algorithm, and allows the generation of other comparable examples. Our analysis can predict whether noise in the geodetic input will produce an unstable output, and identifies a potential approach to filtering the input that enable more robust forecasting 
    more » « less