In the evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples and sends them to the target DNN to trigger misclassifications. In this paper, we propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. That is, there exist two “souls” in an adversarial instance, i.e., the visually unchanged content, which corresponds to the true label, and the added invisible perturbation, which corresponds to the misclassified label. Such inconsistencies could be further amplified through an autoregressive generative approach that generates images with seed pixels selected from the original image, a selected label, and pixel distributions learned from the training data. The generated images (i.e., the “views”) will deviate significantly from the original one if the label is adversarial, demonstrating inconsistencies that Argos expects to detect. To this end, Argos first amplifies the discrepancies between the visual content of an image and its misclassified label induced by the attack using a set of regeneration mechanisms and then identifies an image as adversarial if the reproduced views deviate to a preset degree. Our experimental results show that Argos significantly outperforms two representative adversarial detectors in both detection accuracy and robustness against six well-known adversarial attacks. The code is available at: https://github.com/sohaib730/Argos-Adversarial_Detection. 
                        more » 
                        « less   
                    
                            
                            Detection Mechanisms of One-Pixel Attack
                        
                    
    
            In recent years, a series of researches have revealed that the Deep Neural Network (DNN) is vulnerable to adversarial attack, and a number of attack methods have been proposed. Among those methods, an extremely sly type of attack named the one-pixel attack can mislead DNNs to misclassify an image via only modifying one pixel of the image, leading to severe security threats to DNN-based information systems. Currently, no method can really detect the one-pixel attack, for which the blank will be filled by this paper. This paper proposes two detection methods, including trigger detection and candidate detection. The trigger detection method analyzes the vulnerability of DNN models and gives the most suspected pixel that is modified by the one-pixel attack. The candidate detection method identifies a set of most suspected pixels using a differential evolution-based heuristic algorithm. The real-data experiments show that the trigger detection method has a detection success rate of 9.1%, and the candidate detection method achieves a detection success rate of 30.1%, which can validate the effectiveness of our methods. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1829674
- PAR ID:
- 10274510
- Editor(s):
- Li, Wenzhong
- Date Published:
- Journal Name:
- Wireless Communications and Mobile Computing
- Volume:
- 2021
- ISSN:
- 1530-8669
- Page Range / eLocation ID:
- 1 to 8
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack.Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. Asa representative one, the Bit-Flip based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack the executing DNN function. Prior works of BFA focus on un-targeted attacks that can hack all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper proposes the first work oftargetedBFA based (T-BFA) adversarial weight attack on DNNs, which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with classification of a targeted output through a class-dependent weight bit searching algorithm. Our proposed T-BFA performance is successfully demonstrated on multiple DNN architectures for image classification tasks. For example, by merely flipping 27 out of 88 million weight bits of ResNet-18, our T-BFA can misclassify all the images from Hen class into Goose class (i.e., 100% attack success rate) in ImageNet dataset, while maintaining 59.35% validation accuracy.more » « less
- 
            The globalized semiconductor supply chain significantly increases the risk of exposing System-on-Chip (SoC) designs to malicious implants, popularly known as hardware Trojans. Traditional simulation-based validation is unsuitable for detection of carefully-crafted hardware Trojans with extremely rare trigger conditions. While machine learning (ML) based Trojan detection approaches are promising due to their scalability as well as detection accuracy, ML-based methods themselves are vulnerable from Trojan attacks. In this paper, we propose a robust backdoor attack on ML-based Trojan detection algorithms to demonstrate this serious vulnerability. The proposed framework is able to design an AI Trojan and implant it inside the ML model that can be triggered by specific inputs. Experimental results demonstrate that the proposed AI Trojans can bypass state-of-the-art defense algorithms. Moreover, our approach provides a fast and cost-effective solution in achieving 100% attack success rate that significantly outperforms state-of-the art approaches based on adversarial attacks.more » « less
- 
            Remote sensing datasets usually have a wide range of spatial and spectral resolutions. They provide unique advantages in surveillance systems, and many government organizations use remote sensing multispectral imagery to monitor security-critical infrastructures or targets. Artificial Intelligence (AI) has advanced rapidly in recent years and has been widely applied to remote image analysis, achieving state-of-the-art (SOTA) performance. However, AI models are vulnerable and can be easily deceived or poisoned. A malicious user may poison an AI model by creating a stealthy backdoor. A backdoored AI model performs well on clean data but behaves abnormally when a planted trigger appears in the data. Backdoor attacks have been extensively studied in machine learning-based computer vision applications with natural images. However, much less research has been conducted on remote sensing imagery, which typically consists of many more bands in addition to the red, green, and blue bands found in natural images. In this paper, we first extensively studied a popular backdoor attack, BadNets, applied to a remote sensing dataset, where the trigger was planted in all of the bands in the data. Our results showed that SOTA defense mechanisms, including Neural Cleanse, TABOR, Activation Clustering, Fine-Pruning, GangSweep, Strip, DeepInspect, and Pixel Backdoor, had difficulties detecting and mitigating the backdoor attack. We then proposed an explainable AI-guided backdoor attack specifically for remote sensing imagery by placing triggers in the image sub-bands. Our proposed attack model even poses stronger challenges to these SOTA defense mechanisms, and no method was able to defend it. These results send an alarming message about the catastrophic effects the backdoor attacks may have on satellite imagery.more » « less
- 
            null (Ed.)In the last couple of years, several adversarial attack methods based on different threat models have been proposed for the image classification problem. Most existing defenses consider additive threat models in which sample perturbations have bounded L_p norms. These defenses, however, can be vulnerable against adversarial attacks under non-additive threat models. An example of an attack method based on a non-additive threat model is the Wasserstein adversarial attack proposed by Wong et al. (2019), where the distance between an image and its adversarial example is determined by the Wasserstein metric ("earth-mover distance") between their normalized pixel intensities. Until now, there has been no certifiable defense against this type of attack. In this work, we propose the first defense with certified robustness against Wasserstein Adversarial attacks using randomized smoothing. We develop this certificate by considering the space of possible flows between images, and representing this space such that Wasserstein distance between images is upper-bounded by L_1 distance in this flow-space. We can then apply existing randomized smoothing certificates for the L_1 metric. In MNIST and CIFAR-10 datasets, we find that our proposed defense is also practically effective, demonstrating significantly improved accuracy under Wasserstein adversarial attack compared to unprotected models.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    