skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on April 14, 2025

Title: Discovering Malicious Signatures in Software from Structural Interactions
Malware represents a significant security concern in today’s digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed ("zero-day") malware and are limited by customized virtual machine (VM) environments. To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. The generated network topologies are input into the GraphSAGE architecture to efficiently distinguish between benign and malicious software applications, with the operation names denoted as node features. Importantly, the GraphSAGE models analyze the network’s topological geometry to make predictions, enabling them to detect state-of-the-art malware and prevent potential damage during execution in a VM. To evaluate our approach, we conduct a study on a dataset comprising source code from 24,376 applications, specifically written in C/C++, sourced directly from widely-recognized malware and various types of benign software. The results show a high detection performance with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 99.85%. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution when compared to current state-of-the-art malware detection methods. The code is released at https://github.com/HantangZhang/MGN.  more » « less
Award ID(s):
1936775
NSF-PAR ID:
10528295
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Ko, Hanseok
Publisher / Repository:
IEEE
Date Published:
Edition / Version:
1
Volume:
2024
Issue:
1
ISSN:
0736-7791
ISBN:
979-8-3503-4485-1
Page Range / eLocation ID:
4845 to 4849
Subject(s) / Keyword(s):
Security, malware detection, Graph neural network, Complex network
Format(s):
Medium: X Size: 1.4 Other: pdf
Size(s):
1.4
Location:
Seoul, Korea, Republic of
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app, rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign, and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as (likely) malware. Malware apps originally classified as malware should retain that classification, since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to jointly learn appropriate features and to perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset. 1 We perform an extensive study on other publicly available datasets to show our approach’s effectiveness in detecting repackaged malware with more than 94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score. 
    more » « less
  2. null (Ed.)
    Traditionally, Android malware is analyzed using static or dynamic analysis. Although static techniques are often fast; however, they cannot be applied to classify obfuscated samples or malware with a dynamic payload. In comparison, the dynamic approach can examine obfuscated variants but often incurs significant runtime overhead when collecting every important malware behavioral data. This paper conducts an exploratory analysis of memory forensics as an alternative technique for extracting feature vectors for an Android malware classifier. We utilized the reconstructed per-process object allocation network to identify distinguishable patterns in malware and benign application. Our evaluation results indicate the network structural features in the malware category are unique compared to the benign dataset, and thus features extracted from the remnant of in-memory allocated objects can be utilized for robust Android malware classification algorithm. 
    more » « less
  3. With the rapid technological advancement, security has become a major issue due to the increase in malware activity that poses a serious threat to the security and safety of both computer systems and stakeholders. To maintain stakeholder’s, particularly, end user’s security, protecting the data from fraudulent efforts is one of the most pressing concerns. A set of malicious programming code, scripts, active content, or intrusive software that is designed to destroy intended computer systems and programs or mobile and web applications is referred to as malware. According to a study, naive users are unable to distinguish between malicious and benign applications. Thus, computer systems and mobile applications should be designed to detect malicious activities towards protecting the stakeholders. A number of algorithms are available to detect malware activities by utilizing novel concepts including Artificial Intelligence, Machine Learning, and Deep Learning. In this study, we emphasize Artificial Intelligence (AI) based techniques for detecting and preventing malware activity. We present a detailed review of current malware detection technologies, their shortcomings, and ways to improve efficiency. Our study shows that adopting futuristic approaches for the development of malware detection applications shall provide significant advantages. The comprehension of this synthesis shall help researchers for further research on malware detection and prevention using AI. 
    more » « less
  4. null (Ed.)
    This tutorial provides a review of the state-of-the-art research and the applications of Artificial Intelligence and Machine Learning for malware analysis. We will provide an overview, background and results with respect to the three main malware analysis approaches: static malware analysis, dynamic malware analysis and online malware analysis. Further, we will provide a simplified hands-on tutorial of applying ML algorithm for dynamic malware analysis in cloud IaaS. 
    more » « less
  5. Abstract—Virtual Network Functions (VNFs) are software implementation of middleboxes (MBs) (e.g., firewalls and proxy servers) that provide performance and security guarantees for virtual machine (VM) cloud applications. In this paper, we study a new VM flow migration problem for dynamic VNF-enabled cloud data centers (VDCs). The goal is to migrate the VM flows in the dynamic VDCs to minimize the total network traffic while load-balancing VNFs with limited processing capabilities. We refer to the problem as FMDV: flow migration in dynamic VDCs. We propose an optimal and efficient minimum cost flow-based flow migration algorithm and two benefit-based efficient heuristic algorithms to solve the FMDV. Via extensive simulations, we show that our algorithms are effective in mitigating dynamic cloud traffic while achieving load balance among VNFs. In particular, all our algorithms reduce dynamic network traffic in all cases and our optimal algorithm always achieves the best traffic-mitigation effect, reducing the network traffic by up to 28% compared to the case without flow migration. 
    more » « less