skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Statically Dissecting Internet of Things Malware: Analysis, Characterization, and Detection
Software vulnerabilities in emerging systems, such as the Internet of Things (IoT), allow for multiple attack vectors that are exploited by adversaries for malicious intents. One of such vectors is malware, where limited efforts have been dedicated to IoT malware analysis, characterization, and understanding. In this paper, we analyze recent IoT malware through the lenses of static analysis. Towards this, we reverse-engineer and perform a detailed analysis of almost 2,900 IoT malware samples of eight different architectures across multiple analysis directions. We conduct string analysis, unveiling operation, unique textual characteristics, and network dependencies. Through the control flow graph analysis, we unveil unique graph-theoretic features. Through the function analysis, we address obfuscation by function approximation. We then pursue two applications based on our analysis: 1) Combining various analysis aspects, we reconstruct the infection lifecycle of various prominent malware families, and 2) using multiple classes of features obtained from our static analysis, we design a machine learning-based detection model with features that are robust and an average detection rate of 99.8%.  more » « less
Award ID(s):
1841520
PAR ID:
10318797
Author(s) / Creator(s):
Date Published:
Journal Name:
Lecture notes in computer science
ISSN:
1611-3349
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Traditionally, Android malware is analyzed using static or dynamic analysis. Although static techniques are often fast; however, they cannot be applied to classify obfuscated samples or malware with a dynamic payload. In comparison, the dynamic approach can examine obfuscated variants but often incurs significant runtime overhead when collecting every important malware behavioral data. This paper conducts an exploratory analysis of memory forensics as an alternative technique for extracting feature vectors for an Android malware classifier. We utilized the reconstructed per-process object allocation network to identify distinguishable patterns in malware and benign application. Our evaluation results indicate the network structural features in the malware category are unique compared to the benign dataset, and thus features extracted from the remnant of in-memory allocated objects can be utilized for robust Android malware classification algorithm. 
    more » « less
  2. null (Ed.)
    Android applications rely heavily on strings for sensitive operations like reflection, access to system resources, URL connections, database access, among others. Thus, insight into application behavior can be gained through not only an analysis of what strings an application creates but also the structure of the computation used to create theses strings, and in what manner are these strings used. In this paper we introduce a static analysis of Android applications to discover strings, how they are created, and their usage. The output of our static analysis contains all of this information in the form of a graph which we call a string computation. We leverage the results to classify individual application behavior with respect to malicious or benign intent. Unlike previous work that has focused only on extraction of string values, our approach leverages the structure of the computation used to generate string values as features to perform classification of Android applications. That is, we use none of the static analysis computed string values, rather using only the graph structures of created strings to do classification of an arbitrary Android application as malware or benign. Our results show that leveraging string computation structures as features can yield precision and recall rates as high as 97% on modern malware. We also provide baseline results against other malware detection tools and techniques to classify the same corpus of applications. 
    more » « less
  3. Ko, Hanseok (Ed.)
    Malware represents a significant security concern in today’s digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed ("zero-day") malware and are limited by customized virtual machine (VM) environments. To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. The generated network topologies are input into the GraphSAGE architecture to efficiently distinguish between benign and malicious software applications, with the operation names denoted as node features. Importantly, the GraphSAGE models analyze the network’s topological geometry to make predictions, enabling them to detect state-of-the-art malware and prevent potential damage during execution in a VM. To evaluate our approach, we conduct a study on a dataset comprising source code from 24,376 applications, specifically written in C/C++, sourced directly from widely-recognized malware and various types of benign software. The results show a high detection performance with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 99.85%. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution when compared to current state-of-the-art malware detection methods. The code is released at https://github.com/HantangZhang/MGN. 
    more » « less
  4. As the underground market of malware flourishes, there is an exponential increase in the number and diversity of malware. A crucial question in malware analysis research is how to define malware specifications or signatures that faithfully describe similar malicious intent and also clearly stand out from other programs. Although the traditional malware specifications based on syntactic signatures are efficient, they can be easily defeated by various obfuscation techniques. Since the malicious behavior is often stable across similar malware instances, behavior-based specifications which capture real malicious characteristics during run time, have become more prevalent in anti-malware tasks, such as malware detection and malware clustering. This kind of specification is typically extracted from the system call dependence graph that a malware sample invokes. In this paper, we present replacement attacks to cam- ouflage similar behaviors by poisoning behavior-based specifications. The key method of our attacks is to replace a system call dependence graph to its semantically equivalent variants so that the similar malware samples within one family turn out to be different. As a result, malware analysts have to put more efforts into reexamining the similar samples which may have been investigated before. We distil general attacking strategies by mining more than 5, 200 malware samples’ behavior specifications and implement a compiler-level prototype to automate replacement attacks. Experiments on 960 real malware samples demonstrate the effectiveness of our approach to impede various behavior-based mal- ware analysis tasks, such as similarity comparison and malware clustering. In the end, we also discuss possible countermeasures in order to strengthen existing mal- ware defense. 
    more » « less
  5. In applying deep learning for malware classifica- tion, it is crucial to account for the prevalence of malware evolution, which can cause trained classifiers to fail on drifted malware. Existing solutions to address concept drift use active learning. They select new samples for analysts to label and then retrain the classifier with the new labels. Our key finding is that the current retraining techniques do not achieve optimal results. These techniques overlook that updating the model with scarce drifted samples requires learning features that remain consistent across pre-drift and post-drift data. The model should thus be able to disregard specific features that, while beneficial for the classification of pre-drift data, are absent in post-drift data, thereby preventing prediction degradation. In this paper, we propose a new technique for detecting and classifying drifted malware that learns drift-invariant features in malware control flow graphs by leveraging graph neural networks with adversarial domain adaptation. We compare it with existing model retraining methods in active learning-based malware detection systems and other domain adaptation techniques from the vision domain. Our approach significantly improves drifted malware detection on publicly available benchmarks and real-world malware databases reported daily by security companies in 2024. We also tested our approach in predicting multiple malware families drifted over time. A thorough evaluation shows that our approach outperforms the state-of-the-art approaches. 
    more » « less