skip to main content

Title: NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks
Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and thoroughly testing defense techniques. In this project, we propose a black-box adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently. Instead of searching for an “optimal” adversarial example for a benign input to a targeted DNN, our algorithm finds a probability density distribution over a small region centered around the input, such that a sample drawn from this distribution is likely an adversarial example, without the need of accessing the DNN’s internal layers or weights. Our approach is universal as it can successfully attack different neural networks by a single algorithm. It is also strong; according to the testing against 2 vanilla DNNs and 13 defended ones, it outperforms state-of-the-art black-box or white-box attack methods for most test cases. Additionally, our results reveal that adversarial training remains one of the best defense techniques, and the adversarial examples are not as transferable across defended DNNs as them across vanilla DNNs.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep Neural Networks (DNN) are vulnerable to adversarial perturbations — small changes crafted deliberately on the input to mislead the model for wrong predictions. Adversarial attacks have disastrous consequences for deep learning empowered critical applications. Existing defense and detection techniques both require extensive knowledge of the model, testing inputs and even execution details. They are not viable for general deep learning implementations where the model internal is unknown, a common ‘black-box’ scenario for model users. Inspired by the fact that electromagnetic (EM) emanations of a model inference are dependent on both operations and data and may contain footprints of different input classes, we propose a framework, EMShepherd, to capture EM traces of model execution, perform processing on traces and exploit them for adversarial detection. Only benign samples and their EM traces are used to train the adversarial detector: a set of EM classifiers and class-specific unsupervised anomaly detectors. When the victim model system is under attack by an adversarial example, the model execution will be different from executions for the known classes, and the EM trace will be different. We demonstrate that our air-gapped EMShepherd can effectively detect different adversarial attacks on a commonly used FPGA deep learning accelerator for both Fashion MNIST and CIFAR-10 datasets. It achieves a detection rate on most types of adversarial samples, which is comparable to the state-of-the-art ‘white-box’ software-based detectors. 
    more » « less
  2. FPGA virtualization has garnered significant industry and academic interests as it aims to enable multi-tenant cloud systems that can accommodate multiple users' circuits on a single FPGA. Although this approach greatly enhances the efficiency of hardware resource utilization, it also introduces new security concerns. As a representative study, one state-of-the-art (SOTA) adversarial fault injection attack, named Deep-Dup, exemplifies the vulnerabilities of off-chip data communication within the multi-tenant cloud-FPGA system. Deep-Dup attacks successfully demonstrate the complete failure of a wide range of Deep Neural Networks (DNNs) in a black-box setup, by only injecting fault to extremely small amounts of sensitive weight data transmissions, which are identified through a powerful differential evolution searching algorithm. Such emerging adversarial fault injection attack reveals the urgency of effective defense methodology to protect DNN applications on the multi-tenant cloud-FPGA system. This paper, for the first time, presents a novel moving-target-defense (MTD) oriented defense framework DeepShuffle, which could effectively protect DNNs on multi-tenant cloud-FPGA against the SOTA Deep-Dup attack, through a novel lightweight model parameter shuffling methodology. DeepShuffle effectively counters the Deep-Dup attack by altering the weight transmission sequence, which effectively prevents adversaries from identifying security-critical model parameters from the repeatability of weight transmission during each inference round. Importantly, DeepShuffle represents a training-free DNN defense methodology, which makes constructive use of the typologies of DNN architectures to achieve being lightweight. Moreover, the deployment of DeepShuffle neither requires any hardware modification nor suffers from any performance degradation. We evaluate DeepShuffle on the SOTA open-source FPGA-DNN accelerator, Vertical Tensor Accelerator (VTA), which represents the practice of real-world FPGA-DNN system developers. We then evaluate the performance overhead of DeepShuffle and find it only consumes an additional ~3% of the inference time compared to the unprotected baseline. DeepShuffle improves the robustness of various SOTA DNN architectures like VGG, ResNet, etc. against Deep-Dup by orders. It effectively reduces the efficacy of evolution searching-based adversarial fault injection attack close to random fault injection attack, e.g., on VGG-11, even after increasing the attacker's effort by 2.3x, our defense shows a ~93% improvement in accuracy, compared to the unprotected baseline. 
    more » « less
  3. Deep Neural Networks (DNNs) have shown phenomenal success in a wide range of real-world applications. However, a concerning weakness of DNNs is that they are vulnerable to adversarial attacks. Although there exist methods to detect adversarial attacks, they often suffer constraints on specific attack types and provide limited information to downstream systems. We specifically note that existing adversarial detectors are often binary classifiers, which differentiate clean or adversarial examples. However, detection of adversarial examples is much more complicated than such a scenario. Our key insight is that the confidence probability of detecting an input sample as an adversarial example will be more useful for the system to properly take action to resist potential attacks. In this work, we propose an innovative method for fast confidence detection of adversarial attacks based on integrity of sensor pattern noise embedded in input examples. Experimental results show that our proposed method is capable of providing a confidence distribution model of most of popular adversarial attacks. Furthermore, our presented method can provide early attack warning with even the attack types based on different properties of the confidence distribution models. Since fast confidence detection is a computationally heavy task, we propose an FPGA-Based hardware architecture based on a series of optimization techniques, such as incremental multi-level quantization and etc. We realize our proposed method on an FPGA platform and achieve a high efficiency of 29.740 IPS/W with a power consumption of only 0.7626W. 
    more » « less
  4. In recent years, neuromorphic computing systems (NCS) have gained popularity in accelerating neural network computation because of their high energy efficiency. The known vulnerability of neural networks to adversarial attack, however, raises a severe security concern of NCS. In addition, there are certain application scenarios in which users have limited access to the NCS. In such scenarios, defense technologies that require changing the training methods of the NCS, e.g., adversarial training become impracticable. In this work, we propose AdverQuil – an efficient adversarial detection and alleviation technique for black-box NCS. AdverQuil can identify the adversarial strength of input examples and select the best strategy for NCS to respond to the attack, without changing structure/parameter of the original neural network or its training method. Experimental results show that on MNIST and CIFAR-10 datasets, AdverQuil achieves a high efficiency of 79.5 - 167K image/sec/watt. AdverQuil introduces less than 25% of hardware overhead, and can be combined with various adversarial alleviation techniques to provide a flexible trade-off between hardware cost, energy efficiency and classification accuracy. 
    more » « less
  5. Recent advancements in Deep Neural Networks (DNNs) have enabled widespread deployment in multiple security-sensitive domains. The need for resource-intensive training and the use of valuable domain-specific training data have made these models the top intellectual property (IP) for model owners. One of the major threats to DNN privacy is model extraction attacks where adversaries attempt to steal sensitive information in DNN models. In this work, we propose an advanced model extraction framework DeepSteal that steals DNN weights remotely for the first time with the aid of a memory side-channel attack. Our proposed DeepSteal comprises two key stages. Firstly, we develop a new weight bit information extraction method, called HammerLeak, through adopting the rowhammer-based fault technique as the information leakage vector. HammerLeak leverages several novel system-level techniques tailored for DNN applications to enable fast and efficient weight stealing. Secondly, we propose a novel substitute model training algorithm with Mean Clustering weight penalty, which leverages the partial leaked bit information effectively and generates a substitute prototype of the target victim model. We evaluate the proposed model extraction framework on three popular image datasets (e.g., CIFAR-10/100/GTSRB) and four DNN architectures (e.g., ResNet-18/34/Wide-ResNetNGG-11). The extracted substitute model has successfully achieved more than 90% test accuracy on deep residual networks for the CIFAR-10 dataset. Moreover, our extracted substitute model could also generate effective adversarial input samples to fool the victim model. Notably, it achieves similar performance (i.e., ~1-2% test accuracy under attack) as white-box adversarial input attack (e.g., PGD/Trades). 
    more » « less