skip to main content


Title: Secure machine learning against adversarial samples at test time
Abstract

Deep neural networks (DNNs) are widely used to handle many difficult tasks, such as image classification and malware detection, and achieve outstanding performance. However, recent studies on adversarial examples, which have maliciously undetectable perturbations added to their original samples that are indistinguishable by human eyes but mislead the machine learning approaches, show that machine learning models are vulnerable to security attacks. Though various adversarial retraining techniques have been developed in the past few years, none of them is scalable. In this paper, we propose a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models. The proposed method retrains the model with both Gaussian noise augmentation and adversarial generation techniques for better generalization. Furthermore, the ensemble model is utilized during the testing phase in order to increase the robust test accuracy. The results from our extensive experiments demonstrate that the proposed approach increases the robustness of the DNN model against various adversarial attacks, specifically, fast gradient sign attack, Carlini and Wagner (C&W) attack, Projected Gradient Descent (PGD) attack, and DeepFool attack. To be precise, the robust classifier obtained by our proposed approach can maintain a performance accuracy of 99% on average on the standard test set. Moreover, we empirically evaluate the runtime of two of the most effective adversarial attacks, i.e., C&W attack and BIM attack, to find that the C&W attack can utilize GPU for faster adversarial example generation than the BIM attack can. For this reason, we further develop a parallel implementation of the proposed approach. This parallel implementation makes the proposed approach scalable for large datasets and complex models.

 
more » « less
NSF-PAR ID:
10361610
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
EURASIP Journal on Information Security
Volume:
2022
Issue:
1
ISSN:
2510-523X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Deep Neural Network (DNN) trained by the gradient descent method is known to be vulnerable to maliciously perturbed adversarial input, aka. adversarial attack. As one of the countermeasures against adversarial attacks, increasing the model capacity for DNN robustness enhancement was discussed and reported as an effective approach by many recent works. In this work, we show that shrinking the model size through proper weight pruning can even be helpful to improve the DNN robustness under adversarial attack. For obtaining a simultaneously robust and compact DNN model, we propose a multi-objective training method called Robust Sparse Regularization (RSR), through the fusion of various regularization techniques, including channel-wise noise injection, lasso weight penalty, and adversarial training. We conduct extensive experiments to show the effectiveness of RSR against popular white-box (i.e., PGD and FGSM) and black-box attacks. Thanks to RSR, 85 % weight connections of ResNet-18 can be pruned while still achieving 0.68 % and 8.72 % improvement in clean- and perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its PGD adversarial training baseline. 
    more » « less
  2. Models produced by machine learning, particularly deep neural networks, are state-of-the-art for many machine learning tasks and demonstrate very high prediction accuracy. Unfortunately, these models are also very brittle and vulnerable to specially crafted adversarial examples. Recent results have shown that accuracy of these models can be reduced from close to hundred percent to below 5\% using adversarial examples. This brittleness of deep neural networks makes it challenging to deploy these learning models in security-critical areas where adversarial activity is expected, and cannot be ignored. A number of methods have been recently proposed to craft more effective and generalizable attacks on neural networks along with competing efforts to improve robustness of these learning models. But the current approaches to make machine learning techniques more resilient fall short of their goal. Further, the succession of new adversarial attacks against proposed methods to increase neural network robustness raises doubts about a foolproof approach to robustify machine learning models against all possible adversarial attacks. In this paper, we consider the problem of detecting adversarial examples. This would help identify when the learning models cannot be trusted without attempting to repair the models or make them robust to adversarial attacks. This goal of finding limitations of the learning model presents a more tractable approach to protecting against adversarial attacks. Our approach is based on identifying a low dimensional manifold in which the training samples lie, and then using the distance of a new observation from this manifold to identify whether this data point is adversarial or not. Our empirical study demonstrates that adversarial examples not only lie farther away from the data manifold, but this distance from manifold of the adversarial examples increases with the attack confidence. Thus, adversarial examples that are likely to result into incorrect prediction by the machine learning model is also easier to detect by our approach. This is a first step towards formulating a novel approach based on computational geometry that can identify the limiting boundaries of a machine learning model, and detect adversarial attacks. 
    more » « less
  3. Recent advancements in Deep Neural Networks (DNNs) have enabled widespread deployment in multiple security-sensitive domains. The need for resource-intensive training and the use of valuable domain-specific training data have made these models the top intellectual property (IP) for model owners. One of the major threats to DNN privacy is model extraction attacks where adversaries attempt to steal sensitive information in DNN models. In this work, we propose an advanced model extraction framework DeepSteal that steals DNN weights remotely for the first time with the aid of a memory side-channel attack. Our proposed DeepSteal comprises two key stages. Firstly, we develop a new weight bit information extraction method, called HammerLeak, through adopting the rowhammer-based fault technique as the information leakage vector. HammerLeak leverages several novel system-level techniques tailored for DNN applications to enable fast and efficient weight stealing. Secondly, we propose a novel substitute model training algorithm with Mean Clustering weight penalty, which leverages the partial leaked bit information effectively and generates a substitute prototype of the target victim model. We evaluate the proposed model extraction framework on three popular image datasets (e.g., CIFAR-10/100/GTSRB) and four DNN architectures (e.g., ResNet-18/34/Wide-ResNetNGG-11). The extracted substitute model has successfully achieved more than 90% test accuracy on deep residual networks for the CIFAR-10 dataset. Moreover, our extracted substitute model could also generate effective adversarial input samples to fool the victim model. Notably, it achieves similar performance (i.e., ~1-2% test accuracy under attack) as white-box adversarial input attack (e.g., PGD/Trades). 
    more » « less
  4. The reliability of emerging neuromorphic compute fabrics is of great concern due to their widespread use in critical data-intensive applications. Ensuring such reliability is difficult due to the intensity of underlying computations (billions of parameters), errors induced by low power operation and the complex relationship between errors in computations and their effect on network performance accuracy. We study the problem of designing error-resilient neuromorphic systems where errors can stem from: (a) soft errors in computation of matrix-vector multiplications and neuron activations, (b) malicious trojan and adversarial security attacks and (c) effects of manufacturing process variations on analog crossbar arrays that can affect DNN accuracy. The core principle of error detection relies on embedded predictive neuron checks using invariants derived from the statistics of nominal neuron activation patterns of hidden layers of a neural network. Algorithmic encodings of hidden neuron function are also used to derive invariants for checking. A key contribution is designing checks that are robust to the inherent nonlinearity of neuron computations with minimal impact on error detection coverage. Once errors are detected, they are corrected using probabilistic methods due to the difficulties involved in exact error diagnosis in such complex systems. The technique is scalable across soft errors as well as a range of security attacks. The effects of manufacturing process variations are handled through the use of compact tests from which DNN performance can be assessed using learning techniques. Experimental results on a variety of neuromorphic test systems: DNNs, spiking networks and hyperdimensional computing are presented. 
    more » « less
  5. Deep learning models are vulnerable to adversarial examples. Most of current adversarial attacks add pixel-wise perturbations restricted to some L^p-norm, and defense models are evaluated also on adversarial examples restricted inside L^p-norm balls. However, we wish to explore adversarial examples exist beyond L^p-norm balls and their implications for attacks and defenses. In this paper, we focus on adversarial images generated by transformations. We start with color transformation and propose two gradient-based attacks. Since L^p-norm is inappropriate for measuring image quality in the transformation space, we use the similarity between transformations and the Structural Similarity Index. Next, we explore a larger transformation space consisting of combinations of color and affine transformations. We evaluate our transformation attacks on three data sets --- CIFAR10, SVHN, and ImageNet --- and their corresponding models. Finally, we perform retraining defenses to evaluate the strength of our attacks. The results show that transformation attacks are powerful. They find high-quality adversarial images that have higher transferability and misclassification rates than C&W's L^p attacks, especially at high confidence levels. They are also significantly harder to defend against by retraining than C&W's L^p attacks. More importantly, exploring different attack spaces makes it more challenging to train a universally robust model. 
    more » « less