skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Impeding Behavior-based Malware Analysis via Replacement Attacks to Malware Specications
As the underground market of malware flourishes, there is an exponential increase in the number and diversity of malware. A crucial question in malware analysis research is how to define malware specifications or signatures that faithfully describe similar malicious intent and also clearly stand out from other programs. Although the traditional malware specifications based on syntactic signatures are efficient, they can be easily defeated by various obfuscation techniques. Since the malicious behavior is often stable across similar malware instances, behavior-based specifications which capture real malicious characteristics during run time, have become more prevalent in anti-malware tasks, such as malware detection and malware clustering. This kind of specification is typically extracted from the system call dependence graph that a malware sample invokes. In this paper, we present replacement attacks to cam- ouflage similar behaviors by poisoning behavior-based specifications. The key method of our attacks is to replace a system call dependence graph to its semantically equivalent variants so that the similar malware samples within one family turn out to be different. As a result, malware analysts have to put more efforts into reexamining the similar samples which may have been investigated before. We distil general attacking strategies by mining more than 5, 200 malware samples’ behavior specifications and implement a compiler-level prototype to automate replacement attacks. Experiments on 960 real malware samples demonstrate the effectiveness of our approach to impede various behavior-based mal- ware analysis tasks, such as similarity comparison and malware clustering. In the end, we also discuss possible countermeasures in order to strengthen existing mal- ware defense.  more » « less
Award ID(s):
1223710
PAR ID:
10066921
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Journal of computer virology and hacking techniques
ISSN:
2263-8733
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Malicious software, popularly known as malware, is a serious threat to modern computing systems. A comprehensive cybercrime study by Ponemon Institute highlights that malware is the most expensive attack for organizations, with an average revenue loss of $2.6 million per organization in 2018 (11% increase compared to 2017). Recent high-profile malware attacks coupled with serious economic implications have dramatically changed our perception of threat from malware. Software-based solutions, such as anti-virus programs, are not effective since they rely on matching patterns (signatures) that can be easily fooled by carefully crafted malware with obfuscation or other deviation capabilities. Moreover, software-based solutions are not fast enough for real-time malware detection in safety-critical systems. In this paper, we investigate promising approaches for hardware-assisted malware detection using machine learning. Specifically, we explore how machine learning can be effective for malware detection utilizing hardware performance counters, embedded trace buffer as well as on-chip network traffic analysis. 
    more » « less
  2. Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection. In this work, we investigate the effectiveness of using adversarial training methods to create malware-classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increase the efficiency and scale of creating adversarial examples to make adversarial training practical, which has not been done before in raw-binary malware detectors. We then analyze the effects of varying the length of adversarial training, as well as analyze the effects of training with various types of attacks. We find that data augmentation does not deter state-of-the-art attacks, but that using a generic gradient-guided method, used in other discrete domains, does improve robustness. We also show that in most cases, models can be made more robust to malware-domain attacks by adversarially training them with lower-effort versions of the same attack. In the best case, we reduce one state-of-the-art attack’s success rate from 90% to 5%. We also find that training with some types of attacks can increase robustness to other types of attacks. Finally, we discuss insights gained from our results, and how they can be used to more effectively train robust malware detectors. 
    more » « less
  3. Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection. In this work, we investigate the effectiveness of using adversarial training methods to create malware-classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increase the efficiency and scale of creating adversarial examples to make adversarial training practical, which has not been done before in raw-binary malware detectors. We then analyze the effects of varying the length of adversarial training, as well as analyze the effects of training with various types of attacks. We find that data augmentation does not deter state-of-the-art attacks, but that using a generic gradient-guided method, used in other discrete domains, does improve robustness. We also show that in most cases, models can be made more robust to malware-domain attacks by adversarially training them with lower-effort versions of the same attack. In the best case, we reduce one state-of-the-art attack’s success rate from 90% to 5%. We also find that training with some types of attacks can increase robustness to other types of attacks. Finally, we discuss insights gained from our results, and how they can be used to more effectively train robust malware detectors. 
    more » « less
  4. Software vulnerabilities in emerging systems, such as the Internet of Things (IoT), allow for multiple attack vectors that are exploited by adversaries for malicious intents. One of such vectors is malware, where limited efforts have been dedicated to IoT malware analysis, characterization, and understanding. In this paper, we analyze recent IoT malware through the lenses of static analysis. Towards this, we reverse-engineer and perform a detailed analysis of almost 2,900 IoT malware samples of eight different architectures across multiple analysis directions. We conduct string analysis, unveiling operation, unique textual characteristics, and network dependencies. Through the control flow graph analysis, we unveil unique graph-theoretic features. Through the function analysis, we address obfuscation by function approximation. We then pursue two applications based on our analysis: 1) Combining various analysis aspects, we reconstruct the infection lifecycle of various prominent malware families, and 2) using multiple classes of features obtained from our static analysis, we design a machine learning-based detection model with features that are robust and an average detection rate of 99.8%. 
    more » « less
  5. Security has become a serious problem for Android system as the number of Android malware increases rapidly. A great amount of effort has been devoted to protect Android devices against the threats of malware. Majority of the existing work use two-class classification methods which suffer the overfitting problem due to the lack of malicious samples. This will result in poor performance of detecting zero-day malware attacks. In this paper, we evaluated the performance of various one-class feature selection and classification methods for zero-day Android malware detection. Unlike two-class methods, one-class methods only use benign samples to build the detection model which overcomes the overfitting problem. Our results demonstrate the capability of the one-class methods over the two-class methods in detecting zero-day Android malware attacks. 
    more » « less