Security of modern Deep Neural Networks (DNNs) is under severe scrutiny as the deployment of these models become widespread in many intelligence-based applications. Most recently, DNNs are attacked through Trojan which can effectively infect the model during the training phase and get activated only through specific input patterns (i.e, trigger) during inference. In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through bit-flip attack. Our algorithm efficiently generates a trigger specifically designed to locate certain vulnerable bits of DNN weights stored in main memory (i.e., DRAM). The objective is that once the attacker flips these vulnerable bits, the network still operates with normal inference accuracy with benign input. However, when the attacker activates the trigger by embedding it with any input, the network is forced to classify all inputs to a certain target class. We demonstrate that flipping only several vulnerable bits identified by our method, using available bit-flip techniques (i.e, row-hammer), can transform a fully functional DNN model into a Trojan-infected model. We perform extensive experiments of CIFAR-10, SVHN and ImageNet datasets on both VGG-16 and Resnet-18 architectures. Our proposed TBT could classifymore »
T-BFA: Targeted Bit-Flip Adversarial Weight Attack
Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack.Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. Asa representative one, the Bit-Flip based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack the executing DNN function. Prior works of BFA focus on un-targeted attacks that can hack all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper proposes the first work oftargetedBFA based (T-BFA) adversarial weight attack on DNNs, which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with classification of a targeted output through a class-dependent weight bit searching algorithm. Our proposed T-BFA performance is successfully demonstrated on multiple DNN architectures for image classification tasks. For example, by merely flipping 27 out of 88 million weight bits of ResNet-18, our T-BFA can misclassify all the images from Hen class into Goose class (i.e., 100% attack success rate) in ImageNet dataset, while maintaining 59.35% validation accuracy.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Page Range or eLocation-ID:
- 1 to 1
- Sponsoring Org:
- National Science Foundation
More Like this
Security of machine learning is increasingly becoming a major concern due to the ubiquitous deployment of deep learning in many security-sensitive domains. Many prior studies have shown external attacks such as adversarial examples that tamper the integrity of DNNs using maliciously crafted inputs. However, the security implication of internal threats (i.e., hardware vulnerabilities) to DNN models has not yet been well understood. In this paper, we demonstrate the first hardware-based attack on quantized deep neural networks–DeepHammer–that deterministically induces bit flips in model weights to compromise DNN inference by exploiting the rowhammer vulnerability. DeepHammer performs an aggressive bit search in the DNN model to identify the most vulnerable weight bits that are flippable under system constraints. To trigger deterministic bit flips across multiple pages within a reasonable amount of time, we develop novel system-level techniques that enable fast deployment of victim pages, memory-efficient rowhammering and precise flipping of targeted bits. DeepHammer can deliberately degrade the inference accuracy of the victim DNN system to a level that is only as good as random guess, thus completely depleting the intelligence of targeted DNN systems. We systematically demonstrate our attacks on real systems against 11 DNN architectures with 4 datasets corresponding to different applicationmore »
Recently, a new paradigm of the adversarial attack on the quantized neural network weights has attracted great attention, namely, the Bit-Flip based adversarial weight attack, aka. Bit-Flip Attack (BFA). BFA has shown extraordinary attacking ability, where the adversary can malfunction a quantized Deep Neural Network (DNN) as a random guess, through malicious bit-flips on a small set of vulnerable weight bits (e.g., 13 out of 93 millions bits of 8-bit quantized ResNet-18). However, there are no effective defensive methods to enhance the fault-tolerance capability of DNN against such BFA. In this work, we conduct comprehensive investigations on BFA and propose to leverage binarization-aware training and its relaxation - piece-wise clustering as simple and effective countermeasures to BFA. The experiments show that, for BFA to achieve the identical prediction accuracy degradation (e.g., below 11% on CIFAR-10), it requires 19.3× and 480.1× more effective malicious bit-flips on ResNet-20 and VGG-11 respectively, compared to defend-free counterparts.
Deep Neural Networks (DNN) are vulnerable to adversarial perturbations — small changes crafted deliberately on the input to mislead the model for wrong predictions. Adversarial attacks have disastrous consequences for deep learning empowered critical applications. Existing defense and detection techniques both require extensive knowledge of the model, testing inputs and even execution details. They are not viable for general deep learning implementations where the model internal is unknown, a common ‘black-box’ scenario for model users. Inspired by the fact that electromagnetic (EM) emanations of a model inference are dependent on both operations and data and may contain footprints of different input classes, we propose a framework, EMShepherd, to capture EM traces of model execution, perform processing on traces and exploit them for adversarial detection. Only benign samples and their EM traces are used to train the adversarial detector: a set of EM classifiers and class-specific unsupervised anomaly detectors. When the victim model system is under attack by an adversarial example, the model execution will be different from executions for the known classes, and the EM trace will be different. We demonstrate that our air-gapped EMShepherd can effectively detect different adversarial attacks on a commonly used FPGA deep learning accelerator formore »
We propose AccHashtag, the first framework for high-accuracy detection of fault-injection attacks on Deep Neural Networks (DNNs) with provable bounds on detection performance. Recent literature in fault-injection attacks shows the severe DNN accuracy degradation caused by bit flips. In this scenario, the attacker changes a few DNN weight bits during execution by injecting faults to the dynamic random-access memory (DRAM). To detect bit flips, AccHashtag extracts a unique signature from the benign DNN prior to deployment. The signature is used to validate the model’s integrity and verify the inference output on the fly. We propose a novel sensitivity analysis that identifies the most vulnerable DNN layers to the fault-injection attack. The DNN signature is constructed by encoding the weights in vulnerable layers using a low-collision hash function. During DNN inference, new hashes are extracted from the target layers and compared against the ground-truth signatures. AccHashtag incorporates a lightweight methodology that allows for real-time fault detection on embedded platforms. We devise a specialized compute core for AccHashtag on field-programmable gate arrays (FPGAs) to facilitate online hash generation in parallel to DNN execution. Extensive evaluations with the state-of-the-art bit-flip attack on various DNNs demonstrate the competitive advantage of AccHashtag in terms ofmore »