skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Representing string computations as graphs for classifying malware
Android applications rely heavily on strings for sensitive operations like reflection, access to system resources, URL connections, database access, among others. Thus, insight into application behavior can be gained through not only an analysis of what strings an application creates but also the structure of the computation used to create theses strings, and in what manner are these strings used. In this paper we introduce a static analysis of Android applications to discover strings, how they are created, and their usage. The output of our static analysis contains all of this information in the form of a graph which we call a string computation. We leverage the results to classify individual application behavior with respect to malicious or benign intent. Unlike previous work that has focused only on extraction of string values, our approach leverages the structure of the computation used to generate string values as features to perform classification of Android applications. That is, we use none of the static analysis computed string values, rather using only the graph structures of created strings to do classification of an arbitrary Android application as malware or benign. Our results show that leveraging string computation structures as features can yield precision and recall rates as high as 97% on modern malware. We also provide baseline results against other malware detection tools and techniques to classify the same corpus of applications.  more » « less
Award ID(s):
1749539
PAR ID:
10209446
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
MOBILESoft '20: Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems
Page Range / eLocation ID:
120 to 131
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Traditionally, Android malware is analyzed using static or dynamic analysis. Although static techniques are often fast; however, they cannot be applied to classify obfuscated samples or malware with a dynamic payload. In comparison, the dynamic approach can examine obfuscated variants but often incurs significant runtime overhead when collecting every important malware behavioral data. This paper conducts an exploratory analysis of memory forensics as an alternative technique for extracting feature vectors for an Android malware classifier. We utilized the reconstructed per-process object allocation network to identify distinguishable patterns in malware and benign application. Our evaluation results indicate the network structural features in the malware category are unique compared to the benign dataset, and thus features extracted from the remnant of in-memory allocated objects can be utilized for robust Android malware classification algorithm. 
    more » « less
  2. null (Ed.)
    Certain Android applications, such as but not limited to malware, conceal their presence from the user, exhibiting a self-hiding behavior. Consequently, these apps put the user's security and privacy at risk by performing tasks without the user's awareness. Static analysis has been used to analyze apps for self-hiding behavior, but this approach is prone to false positives and suffers from code obfuscation. This research proposes a set of three tools utilizing a dynamic analysis method of detecting self-hiding behavior of an app in the home, installed, and running application lists on an Android emulator. Our approach proves both highly accurate and efficient, providing tools usable by the Android marketplace for enhanced security screening. 
    more » « less
  3. Security has become a serious problem for Android system as the number of Android malware increases rapidly. A great amount of effort has been devoted to protect Android devices against the threats of malware. Majority of the existing work use two-class classification methods which suffer the overfitting problem due to the lack of malicious samples. This will result in poor performance of detecting zero-day malware attacks. In this paper, we evaluated the performance of various one-class feature selection and classification methods for zero-day Android malware detection. Unlike two-class methods, one-class methods only use benign samples to build the detection model which overcomes the overfitting problem. Our results demonstrate the capability of the one-class methods over the two-class methods in detecting zero-day Android malware attacks. 
    more » « less
  4. Machine learning techniques are widely used in addition to signatures and heuristics to increase the detection rate of anti-malware software, as they automate the creation of detection models, making it possible to handle an ever-increasing number of new malware samples. In order to foil the analysis of anti-malware systems and evade detection, malware uses packing and other forms of obfuscation. However, few realize that benign applications use packing and obfuscation as well, to protect intellectual property and prevent license abuse. In this paper, we study how machine learning based on static analysis features operates on packed samples. Malware researchers have often assumed that packing would prevent machine learning techniques from building effective classifiers. However, both industry and academia have published results that show that machine-learning-based classifiers can achieve good detection rates, leading many experts to think that classifiers are simply detecting the fact that a sample is packed, as packing is more prevalent in malicious samples. We show that, different from what is commonly assumed, packers do preserve some information when packing programs that is “useful” for malware classification. However, this information does not necessarily capture the sample’s behavior. We demonstrate that the signals extracted from packed executables are not rich enough for machine-learning-based models to (1) generalize their knowledge to operate on unseen packers, and (2) be robust against adversarial examples. We also show that a na¨ıve application of machine learning techniques results in a substantial number of false positives, which, in turn, might have resulted 
    more » « less
  5. Many popular vetting tools for Android applications use static code analysis techniques. In particular, Inter-procedural Data-Flow Graph (IDFG) construction is the computation at the core of Android static data-flow analysis and consumes most of the analysis time. Many analysis tools use a worklist algorithm, an iterative fixed-point approach, to construct the IDFG. In this paper, we observe that a straightforward GPU parallelization of the worklist algorithm leads to significant underutilization of the GPU resources. We identify four performance bottlenecks, namely, frequent dynamic memory allocations, high branch divergence, workload imbalance, and irregular memory access patterns. Accordingly, we propose GDroid, a GPU-based worklist algorithm implementation with multiple fine-grained optimizations tailored to common characteristics of Android applications. The optimizations considered are: matrix-based data structure, memory access-based node grouping, and worklist merging. Our experimental evaluation, performed on 1000 Android applications, shows that the proposed optimizations are beneficial to performance, and GDroid can achieve up to 128X speedups against a plain GPU implementation. 
    more » « less