skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Arms Race in Adversarial Malware Detection: A Survey
Malicious software (malware) is a major cyber threat that has to be tackled with Machine Learning (ML) techniques because millions of new malware examples are injected into cyberspace on a daily basis. However, ML is vulnerable to attacks known as adversarial examples. In this article, we survey and systematize the field of Adversarial Malware Detection (AMD) through the lens of a unified conceptual framework of assumptions, attacks, defenses, and security properties. This not only leads us to map attacks and defenses to partial order structures, but also allows us to clearly describe the attack-defense arms race in the AMD context. We draw a number of insights, including: knowing the defender’s feature set is critical to the success of transfer attacks; the effectiveness of practical evasion attacks largely depends on the attacker’s freedom in conducting manipulations in the problem space; knowing the attacker’s manipulation set is critical to the defender’s success; and the effectiveness of adversarial training depends on the defender’s capability in identifying the most powerful attack. We also discuss a number of future research directions.  more » « less
Award ID(s):
2203261 2115134 2122631 2209814 2218762
PAR ID:
10319615
Author(s) / Creator(s):
; ;  ;
Date Published:
Journal Name:
ACM Computing Surveys
Volume:
55
Issue:
1
ISSN:
0360-0300
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection. In this work, we investigate the effectiveness of using adversarial training methods to create malware-classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increase the efficiency and scale of creating adversarial examples to make adversarial training practical, which has not been done before in raw-binary malware detectors. We then analyze the effects of varying the length of adversarial training, as well as analyze the effects of training with various types of attacks. We find that data augmentation does not deter state-of-the-art attacks, but that using a generic gradient-guided method, used in other discrete domains, does improve robustness. We also show that in most cases, models can be made more robust to malware-domain attacks by adversarially training them with lower-effort versions of the same attack. In the best case, we reduce one state-of-the-art attack’s success rate from 90% to 5%. We also find that training with some types of attacks can increase robustness to other types of attacks. Finally, we discuss insights gained from our results, and how they can be used to more effectively train robust malware detectors. 
    more » « less
  2. Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection. In this work, we investigate the effectiveness of using adversarial training methods to create malware-classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increase the efficiency and scale of creating adversarial examples to make adversarial training practical, which has not been done before in raw-binary malware detectors. We then analyze the effects of varying the length of adversarial training, as well as analyze the effects of training with various types of attacks. We find that data augmentation does not deter state-of-the-art attacks, but that using a generic gradient-guided method, used in other discrete domains, does improve robustness. We also show that in most cases, models can be made more robust to malware-domain attacks by adversarially training them with lower-effort versions of the same attack. In the best case, we reduce one state-of-the-art attack’s success rate from 90% to 5%. We also find that training with some types of attacks can increase robustness to other types of attacks. Finally, we discuss insights gained from our results, and how they can be used to more effectively train robust malware detectors. 
    more » « less
  3. null (Ed.)
    As machine learning is deployed in more settings, including in security-sensitive applications such as malware detection, the risks posed by adversarial examples that fool machine-learning classifiers have become magnified. Black-box attacks are especially dangerous, as they only require the attacker to have the ability to query the target model and observe the labels it returns, without knowing anything else about the model. Current black-box attacks either have low success rates, require a high number of queries, produce adversarial images that are easily distinguishable from their sources, or are not flexible in controlling the outcome of the attack. In this paper, we present AdversarialPSO, (Code available: https://github.com/rhm6501/AdversarialPSOImages) a black-box attack that uses few queries to create adversarial examples with high success rates. AdversarialPSO is based on Particle Swarm Optimization, a gradient-free evolutionary search algorithm, with special adaptations to make it effective for the black-box setting. It is flexible in balancing the number of queries submitted to the target against the quality of the adversarial examples. We evaluated AdversarialPSO on CIFAR-10, MNIST, and Imagenet, achieving success rates of 94.9%, 98.5%, and 96.9%, respectively, while submitting numbers of queries comparable to prior work. Our results show that black-box attacks can be adapted to favor fewer queries or higher quality adversarial images, while still maintaining high success rates. 
    more » « less
  4. The pervasiveness of neural networks (NNs) in critical computer vision and image processing applications makes them very attractive for adversarial manipulation. A large body of existing research thoroughly investigates two broad categories of attacks targeting the integrity of NN models. The first category of attacks, commonly called Adversarial Examples, perturbs the model's inference by carefully adding noise into input examples. In the second category of attacks, adversaries try to manipulate the model during the training process by implanting Trojan backdoors. Researchers show that such attacks pose severe threats to the growing applications of NNs and propose several defenses against each attack type individually. However, such one-sided defense approaches leave potentially unknown risks in real-world scenarios when an adversary can unify different attacks to create new and more lethal ones bypassing existing defenses. In this work, we show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model. We leverage adversarial noise in the input space to move Trojan-infected examples across the model decision boundary, making it difficult to detect. The stealthiness behavior of AdvTrojan fools the users into accidentally trusting the infected model as a robust classifier against adversarial examples. AdvTrojan can be implemented by only poisoning the training data similar to conventional Trojan backdoor attacks. Our thorough analysis and extensive experiments on several benchmark datasets show that AdvTrojan can bypass existing defenses with a success rate close to 100% in most of our experimental scenarios and can be extended to attack federated learning as well as high-resolution images. 
    more » « less
  5. With machine learning techniques widely used to automate Android malware detection, it is important to investigate the robustness of these methods against evasion attacks. A recent work has proposed a novel problem-space attack on Android malware classifiers, where adversarial examples are generated by transforming Android malware samples while satisfying practical constraints. Aimed to address its limitations, we propose a new attack called EAGLE (Evasion Attacks Guided by Local Explanations), whose key idea is to leverage local explanations to guide the search for adversarial examples. We present a generic algorithmic framework for EAGLE attacks, which can be customized with specific feature increase and decrease operations to evade Android malware classifiers trained on different types of count features. We overcome practical challenges in implementing these operations for four different types of Android malware classifiers. Using two Android malware datasets, our results show that EAGLE attacks can be highly effective at finding functionable adversarial examples. We study the attack transferrability of malware variants created by EAGLE attacks across classifiers built with different classification models or trained on different types of count features. Our research further demonstrates that ensemble classifiers trained from multiple types of count features are not immune to EAGLE attacks. We also discuss possible defense mechanisms against EAGLE attacks. 
    more » « less