skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Popular Imperceptibility Measures in Visual Adversarial Attacks are Far from Human Perception
Adversarial attacks on image classification aim to make visually imperceptible changes to induce misclassification. Popular computational definitions of imperceptibility are largely based on mathematical convenience such as pixel p-norms. We perform a behavioral study that allows us to quantitatively demonstrate the mismatch between human perception and popular imperceptibility measures such as pixel p-norms, earth mover’s distance, structural similarity index, and deep net embedding. Our results call for a reassessment of current adversarial attack formulation.  more » « less
Award ID(s):
2023239
PAR ID:
10533101
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Decision and Game Theory for Security. GameSec 2020. Lecture Notes in Computer Science
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the last couple of years, several adversarial attack methods based on different threat models have been proposed for the image classification problem. Most existing defenses consider additive threat models in which sample perturbations have bounded L_p norms. These defenses, however, can be vulnerable against adversarial attacks under non-additive threat models. An example of an attack method based on a non-additive threat model is the Wasserstein adversarial attack proposed by Wong et al. (2019), where the distance between an image and its adversarial example is determined by the Wasserstein metric ("earth-mover distance") between their normalized pixel intensities. Until now, there has been no certifiable defense against this type of attack. In this work, we propose the first defense with certified robustness against Wasserstein Adversarial attacks using randomized smoothing. We develop this certificate by considering the space of possible flows between images, and representing this space such that Wasserstein distance between images is upper-bounded by L_1 distance in this flow-space. We can then apply existing randomized smoothing certificates for the L_1 metric. In MNIST and CIFAR-10 datasets, we find that our proposed defense is also practically effective, demonstrating significantly improved accuracy under Wasserstein adversarial attack compared to unprotected models. 
    more » « less
  2. Universal Adversarial Perturbations (UAPs) are imperceptible, image-agnostic vectors that cause deep neural networks (DNNs) to misclassify inputs with high probability. In practical attack scenarios, adversarial perturbations may undergo transformations such as changes in pixel intensity, scaling, etc. before being added to DNN inputs. Existing methods do not create UAPs robust to these real-world transformations, thereby limiting their applicability in practical attack scenarios. In this work, we introduce and formulate UAPs robust against real-world transformations. We build an iterative algorithm using probabilistic robustness bounds and construct such UAPs robust to transformations generated by composing arbitrary sub-differentiable transformation functions. We perform an extensive evaluation on the popular CIFAR-10 and ILSVRC 2012 datasets measuring our UAPs' robustness under a wide range common, real-world transformations such as rotation, contrast changes, etc. We further show that by using a set of primitive transformations our method can generalize well to unseen transformations such as fog, JPEG compression, etc. Our results show that our method can generate UAPs up to 23% more robust than state-of-the-art baselines. 
    more » « less
  3. null (Ed.)
    Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks. 
    more » « less
  4. Deep learning models have strong potential for automating breast ultrasound (BUS) image classification to support early cancer detection. However, their vulnerability to small input perturbations poses a challenge for clinical reliability. This study examines how minimal pixel-level changes affect classification performance and predictive uncertainty, using the BUSI dataset and a ResNet-50 classifier. Two perturbation types are evaluated: (1) adversarial perturbations via the One Pixel Attack and (2) non-adversarial, device-related noise simulated by setting a single pixel to black. Robustness is assessed alongside uncertainty estimation using Monte Carlo Dropout, with metrics including Expected Kullback–Leibler divergence (EKL), Predictive Variance (PV), and Mutual Information (MI) for epistemic uncertainty, and Maximum Class Probability (MP) for aleatoric uncertainty. Both perturbations reduced accuracy, producing 17 and 29 “fooled” test samples, defined as cases classified correctly before but incorrectly after perturbation, for the adversarial and non-adversarial settings, respectively. Samples that remained correct are referred to as “unfooled.” Across all metrics, uncertainty increased after perturbation for both groups, and fooled samples had higher uncertainty than unfooled samples even before perturbation. We also identify spatially localized “uncertainty-decreasing” regions, where individual single-pixel blackouts both flipped predictions and reduced uncertainty, creating overconfident errors. These regions represent high-risk vulnerabilities that could be exploited in adversarial attacks or addressed through targeted robustness training and uncertainty-aware safeguards. Overall, combining perturbation analysis with uncertainty quantification provides valuable insights into model weaknesses and can inform the design of safer, more reliable AI systems for BUS diagnosis. 
    more » « less
  5. Abstract The emergence of large language models has significantly expanded the use of natural language processing (NLP), even as it has heightened exposure to adversarial threats. We present an overview of adversarial NLP with an emphasis on challenges, policy implications, emerging areas, and future directions. First, we review attack methods and evaluate the vulnerabilities of popular NLP models. Then, we review defense strategies that include adversarial training. We describe major policy implications, identify key trends, and suggest future directions, such as the use of Bayesian methods to improve the security and robustness of NLP systems. 
    more » « less