We study the problem of defending deep neural network approaches for image classification from physically realizable attacks. First, we demonstrate that the two most scalable and effective methods for learning robust models, adversarial training with PGD attacks and randomized smoothing, exhibit very limited effectiveness against three of the highest profile physical attacks. Next, we propose a new abstract adversarial model, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and develop two approaches for efficiently computing the resulting adversarial examples. Finally, we demonstrate that adversarial training using our new attack yields image classification models that exhibit high robustness against the physically realizable attacks we study, offering the first effective generic defense against such attacks.
more »
« less
Defending Against Physically Realizable Attacks on Image Classification
We study the problem of defending deep neural network approaches for image classification from physically realizable attacks. First, we demonstrate that the two most scalable and effective methods for learning robust models, adversarial training with PGD attacks and randomized smoothing, exhibit very limited effectiveness against three of the highest profile physical attacks. Next, we propose a new abstract adversarial model, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and develop two approaches for efficiently computing the resulting adversarial examples. Finally, we demonstrate that adversarial training using our new attack yields image classification models that exhibit high robustness against the physically realizable attacks we study, offering the first effective generic defense against such attacks.
more »
« less
- PAR ID:
- 10174064
- Date Published:
- Journal Name:
- International Conference on Learning Representations
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet–50 under ImageNet).more » « less
-
null (Ed.)Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks.more » « less
-
Deep learning models have been used in creating various effective image classification applications. However, they are vulnerable to adversarial attacks that seek to misguide the models into predicting incorrect classes. Our study of major adversarial attack models shows that they all specifically target and exploit the neural networking structures in their designs. This understanding led us to develop a hypothesis that most classical machine learning models, such as random forest (RF), are immune to adversarial attack models because they do not rely on neural network design at all. Our experimental study of classical machine learning models against popular adversarial attacks supports this hypothesis. Based on this hypothesis, we propose a new adversarial-aware deep learning system by using a classical machine learning model as the secondary verification system to complement the primary deep learning model in image classification. Although the secondary classical machine learning model has less accurate output, it is only used for verification purposes, which does not impact the output accuracy of the primary deep learning model, and, at the same time, can effectively detect an adversarial attack when a clear mismatch occurs. Our experiments based on the CIFAR-100 dataset show that our proposed approach outperforms current state-of-the-art adversarial defense systems.more » « less
-
Models produced by machine learning, particularly deep neural networks, are state-of-the-art for many machine learning tasks and demonstrate very high prediction accuracy. Unfortunately, these models are also very brittle and vulnerable to specially crafted adversarial examples. Recent results have shown that accuracy of these models can be reduced from close to hundred percent to below 5\% using adversarial examples. This brittleness of deep neural networks makes it challenging to deploy these learning models in security-critical areas where adversarial activity is expected, and cannot be ignored. A number of methods have been recently proposed to craft more effective and generalizable attacks on neural networks along with competing efforts to improve robustness of these learning models. But the current approaches to make machine learning techniques more resilient fall short of their goal. Further, the succession of new adversarial attacks against proposed methods to increase neural network robustness raises doubts about a foolproof approach to robustify machine learning models against all possible adversarial attacks. In this paper, we consider the problem of detecting adversarial examples. This would help identify when the learning models cannot be trusted without attempting to repair the models or make them robust to adversarial attacks. This goal of finding limitations of the learning model presents a more tractable approach to protecting against adversarial attacks. Our approach is based on identifying a low dimensional manifold in which the training samples lie, and then using the distance of a new observation from this manifold to identify whether this data point is adversarial or not. Our empirical study demonstrates that adversarial examples not only lie farther away from the data manifold, but this distance from manifold of the adversarial examples increases with the attack confidence. Thus, adversarial examples that are likely to result into incorrect prediction by the machine learning model is also easier to detect by our approach. This is a first step towards formulating a novel approach based on computational geometry that can identify the limiting boundaries of a machine learning model, and detect adversarial attacks.more » « less
An official website of the United States government

