skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 15, 2025

Title: Explanation-based Adversarial Detection with Noise Reduction
Deep Neural Networks (DNNs) have achieved tremendous success in various tasks. However, DNNs exhibit uncertainty and unreliability when faced with well-designed adversarial examples, leading to misclassification. To address this, a variety of methods have been proposed to improve the robustness of DNNs by detecting adversarial attacks. In this paper, we combine model explanation techniques with adversarial models to enhance adversarial detection in real-world scenarios. Specifically, we develop a novel adversary-resistant detection framework called EXPLAINER, which utilizes explanation results extracted from explainable learning models. The explanation model in EXPLAINER generates an explanation map that identifies the relevance of input variables to the model’s classification result. Consequently, adversarial examples can be effectively detected by comparing the explanation results of a given sample with its denoised version, without relying on any prior knowledge of attacks. The proposed framework is thoroughly evaluated against different adversarial attacks, and experimental results demonstrate that our approach achieves promising results in white-box attack scenarios.  more » « less
Award ID(s):
2238700
PAR ID:
10608608
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-6248-0
Page Range / eLocation ID:
6374 to 6378
Subject(s) / Keyword(s):
Adversarial detection Model explanation Noise reduction.
Format(s):
Medium: X
Location:
Washington, DC, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep Neural Networks (DNNs) have shown phenomenal success in a wide range of real-world applications. However, a concerning weakness of DNNs is that they are vulnerable to adversarial attacks. Although there exist methods to detect adversarial attacks, they often suffer constraints on specific attack types and provide limited information to downstream systems. We specifically note that existing adversarial detectors are often binary classifiers, which differentiate clean or adversarial examples. However, detection of adversarial examples is much more complicated than such a scenario. Our key insight is that the confidence probability of detecting an input sample as an adversarial example will be more useful for the system to properly take action to resist potential attacks. In this work, we propose an innovative method for fast confidence detection of adversarial attacks based on integrity of sensor pattern noise embedded in input examples. Experimental results show that our proposed method is capable of providing a confidence distribution model of most of popular adversarial attacks. Furthermore, our presented method can provide early attack warning with even the attack types based on different properties of the confidence distribution models. Since fast confidence detection is a computationally heavy task, we propose an FPGA-Based hardware architecture based on a series of optimization techniques, such as incremental multi-level quantization and etc. We realize our proposed method on an FPGA platform and achieve a high efficiency of 29.740 IPS/W with a power consumption of only 0.7626W. 
    more » « less
  2. null (Ed.)
    As deep neural networks (DNNs) achieve extraordi- nary performance in a wide range of tasks, testing their robust- ness under adversarial attacks becomes paramount. Adversarial attacks, also known as adversarial examples, are used to measure the robustness of DNNs and are generated by incorporating imperceptible perturbations into the input data with the intention of altering a DNN’s classification. In prior work in this area, most of the proposed optimization based methods employ gradient descent to find adversarial examples. In this paper, we present an innovative method which generates adversarial examples via convex programming. Our experiment results demonstrate that we can generate adversarial examples with lower distortion and higher transferability than the C&W attack, which is the current state-of-the-art adversarial attack method for DNNs. We achieve 100% attack success rate on both the original undefended models and the adversarially-trained models. Our distortions of the L∞ attack are respectively 31% and 18% lower than the C&W attack for the best case and average case on the CIFAR-10 data set. 
    more » « less
  3. With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications, such as e-commerce and social media, where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic studies on how to design defense strategies against visual attacks on VARS. In this article, we attempt to fill this gap by proposing anadversarial image denoising and detectionframework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized bylocalperturbations by image denoising based onglobalvision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can bejointlytrained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. Our approach is uniquely tailored for VARS, addressing the distinct challenges in scenarios where adversarial attacks can differ across industries, for instance, causing misclassification in e-commerce or misrepresentation in real estate. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method demonstrates high accuracy in detecting adversarial examples, complementing its robustness across various types of adversarial attacks. 
    more » « less
  4. null (Ed.)
    Transfer learning using pre-trained deep neural networks (DNNs) has been widely used for plant disease identification recently. However, pre-trained DNNs are susceptible to adversarial attacks which generate adversarial samples causing DNN models to make wrong predictions. Successful adversarial attacks on deep learning (DL)-based plant disease identification systems could result in a significant delay of treatments and huge economic losses. This paper is the first attempt to study adversarial attacks and detection on DL-based plant disease identification. Our results show that adversarial attacks with a small number of perturbations can dramatically degrade the performance of DNN models for plant disease identification. We also find that adversarial attacks can be effectively defended by using adversarial sample detection with an appropriate choice of features. Our work will serve as a basis for developing more robust DNN models for plant disease identification and guiding the defense against adversarial attacks. 
    more » « less
  5. Deep neural networks (DNNs) are vulnerable to adversarial examples—maliciously crafted inputs that cause DNNs to make incorrect predictions. Recent work has shown that these attacks generalize to the physical domain, to create perturbations on physical objects that fool image classifiers under a variety of real-world conditions. Such attacks pose a risk to deep learning models used in safety-critical cyber-physical systems. In this work, we extend physical attacks to more challenging object detection models, a broader class of deep learning algorithms widely used to detect and label multiple objects within a scene. Improving upon a previous physical attack on image classifiers, we create perturbed physical objects that are either ignored or mislabeled by object detection models. We implement a Disappearance Attack, in which we cause a Stop sign to “disappear” according to the detector—either by covering the sign with an adversarial Stop sign poster, or by adding adversarial stickers onto the sign. In a video recorded in a controlled lab environment, the state-of-the-art YOLO v2 detector failed to recognize these adversarial Stop signs in over 85% of the video frames. In an outdoor experiment, YOLO was fooled by the poster and sticker attacks in 72.5% and 63.5% of the video frames respectively. We also use Faster R-CNN, a different object detection model, to demonstrate the transferability of our adversarial perturbations. The created poster perturbation is able to fool Faster R-CNN in 85.9% of the video frames in a controlled lab environment, and 40.2% of the video frames in an outdoor environment. Finally, we present preliminary results with a new Creation Attack, wherein innocuous physical stickers fool a model into detecting nonexistent objects. 
    more » « less