skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 6, 2026

Title: Explainable Adversarial Attacks on Coarse-to-Fine Classifiers
Traditional adversarial attacks typically aim to alter the predicted labels of input images by generating perturbations that are imperceptible to the human eye. However, these approaches often lack explainability. Moreover, most existing work on adversarial attacks focuses on single-stage classifiers, but multi-stage classifiers are largely unexplored. In this paper, we introduce instance-based adversarial attacks for multi-stage classifiers, leveraging Layer-wise Relevance Propagation (LRP), which assigns relevance scores to pixels based on their influence on classification outcomes. Our approach generates explainable adversarial perturbations by utilizing LRP to identify and target key features critical for both coarse and fine-grained classifications. Unlike conventional attacks, our method not only induces misclassification but also enhances the interpretability of the model’s behavior across classification stages, as demonstrated by experimental results.  more » « less
Award ID(s):
2304489
PAR ID:
10632631
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE International Conference on Acoustics, Speech and Signal Processing
Date Published:
ISBN:
979-8-3503-6874-1
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Location:
Hyderabad, India
Sponsoring Org:
National Science Foundation
More Like this
  1. The Controller Area Network (CAN) is a ubiquitous bus protocol present in the Electrical/Electronic (E/E) systems of almost all vehicles. It is vulnerable to a range of attacks once the attacker gains access to the bus through the vehicle’s attack surface. We address the problem of Intrusion Detection on the CAN bus and present a series of methods based on two classifiers trained with Auxiliary Classifier Generative Adversarial Network (ACGAN) to detect and assign fine-grained labels to Known Attacks and also detect the Unknown Attack class in a dataset containing a mixture of (Normal + Known Attacks + Unknown Attack) messages. The most effective method is a cascaded two-stage classification architecture, with the multi-class Auxiliary Classifier in the first stage for classification of Normal and Known Attacks, passing Out-of-Distribution (OOD) samples to the binary Real-Fake Classifier in the second stage for detection of the Unknown Attack class. Performance evaluation demonstrates that our method achieves both high classification accuracy and low runtime overhead, making it suitable for deployment in the resource-constrained in-vehicle environment. 
    more » « less
  2. Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to discriminate perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models. To identify adversarial attacks, a perturbation discriminator validates how likely a token in the text is perturbed and provides a set of potential perturbations. For each potential perturbation, an embedding estimator learns to restore the embedding of the original word based on the context and a replacement token is chosen based on approximate kNN search. DISP can block adversarial attacks for any NLP model without modifying the model structure or training procedure. Extensive experiments on two benchmark datasets demonstrate that DISP significantly outperforms baseline methods in blocking adversarial attacks for text classification. In addition, in-depth analysis shows the robustness of DISP across different situations. 
    more » « less
  3. With machine learning techniques widely used to automate Android malware detection, it is important to investigate the robustness of these methods against evasion attacks. A recent work has proposed a novel problem-space attack on Android malware classifiers, where adversarial examples are generated by transforming Android malware samples while satisfying practical constraints. Aimed to address its limitations, we propose a new attack called EAGLE (Evasion Attacks Guided by Local Explanations), whose key idea is to leverage local explanations to guide the search for adversarial examples. We present a generic algorithmic framework for EAGLE attacks, which can be customized with specific feature increase and decrease operations to evade Android malware classifiers trained on different types of count features. We overcome practical challenges in implementing these operations for four different types of Android malware classifiers. Using two Android malware datasets, our results show that EAGLE attacks can be highly effective at finding functionable adversarial examples. We study the attack transferrability of malware variants created by EAGLE attacks across classifiers built with different classification models or trained on different types of count features. Our research further demonstrates that ensemble classifiers trained from multiple types of count features are not immune to EAGLE attacks. We also discuss possible defense mechanisms against EAGLE attacks. 
    more » « less
  4. null (Ed.)
    Recent publications have shown that neural network based classifiers are vulnerable to adversarial inputs that are virtually indistinguishable from normal data, constructed explicitly for the purpose of forcing misclassification. In this paper, we present several defenses to counter these threats. First, we observe that most adversarial attacks succeed by mounting gradient ascent on the confidence returned by the model, which allows adversary to gain understanding of the classification boundary. Our defenses are based on denying access to the precise classification boundary. Our first defense adds a controlled random noise to the output confidence levels, which prevents an adversary from converging in their numerical approximation attack. Our next defense is based on the observation that by varying the order of the training, often we arrive at models which offer the same classification accuracy, yet they are different numerically. An ensemble of such models allows us to randomly switch between these equivalent models during query which further blurs the classification boundary. We demonstrate our defense via an adversarial input generator which defeats previously published defenses but cannot breach the proposed defenses do to their non-static nature. 
    more » « less
  5. We investigate the role of representations and architectures for classifying 3D shapes in terms of their computational efficiency, generalization, and robustness to adversarial transformations. By varying the number of training examples and employing cross-modal transfer learning we study the role of initialization of existing deep architectures for 3D shape classification. Our analysis shows that multiview methods continue to offer the best generalization even without pretraining on large labeled image datasets, and even when trained on simplified inputs such as binary silhouettes. Furthermore, the performance of voxel-based 3D convolutional networks and point-based architectures can be improved via cross-modal transfer from image representations. Finally, we analyze the robustness of 3D shape classifiers to adversarial transformations and present a novel approach for generating adversarial perturbations of a 3D shape for multiview classifiers using a differentiable renderer. We find that point-based networks are more robust to point position perturbations while voxel-based and multiview networks are easily fooled with the addition of imperceptible noise to the input. 
    more » « less