skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Metamorphic Detection of Repackaged Malware
Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app, rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign, and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as (likely) malware. Malware apps originally classified as malware should retain that classification, since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to jointly learn appropriate features and to perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset. 1 We perform an extensive study on other publicly available datasets to show our approach’s effectiveness in detecting repackaged malware with more than 94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score.  more » « less
Award ID(s):
1563555 1815494
PAR ID:
10281249
Author(s) / Creator(s):
;
Date Published:
Journal Name:
6th International Workshop on Metamorphic Testing (MET)
Page Range / eLocation ID:
9 - 16
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Android is the most targeted mobile OS. Studies have found that repackaging is one of the most common techniques that adversaries use to distribute malware, and detecting such malware can be difficult because they share large parts of the code with benign apps. Other studies have highlighted the privacy implications of zero-permission sensors. In this work, we investigate if repackaged malicious apps utilize more sensors than the benign counterpart for malicious purposes. We analyzed 15,297 app pairs for sensor usage. We provide evidence that zero-permission sensors are indeed used by malicious apps to perform various activities. We use this information to train a robust classifier to detect repackaged malware in the wild. 
    more » « less
  2. The COVID-19 pandemic was a catalyst for many different trends in our daily life worldwide. While there has been an overall rise in cybercrime during this time, there has been relatively little research done about malicious COVID-19 themed AndroidOS applications. With the rise in reports of users falling victim to malicious COVID-19 themed AndroidOS applications, there is a need to learn about the detection of malware for pandemics-themed mobile apps.. In this project, we extracted the permissions requests from 1959 APK files from a dataset containing benign and malware COVID-19 themed apps. We then created and compared eight unique models of four varying classifiers to determine their ability to identify potentially malicious APK files based on the permissions the APK file requests: support vector machine, neural network, decision trees, and categorical naive bayes. These classifiers were then trained using Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset due to the lack of samples of malware compared to non-malware APKs. Finally, we evaluated the models using K-Fold Cross-Validation and found the decision tree classifier to be the best performing classifier. 
    more » « less
  3. Security has become a serious problem for Android system as the number of Android malware increases rapidly. A great amount of effort has been devoted to protect Android devices against the threats of malware. Majority of the existing work use two-class classification methods which suffer the overfitting problem due to the lack of malicious samples. This will result in poor performance of detecting zero-day malware attacks. In this paper, we evaluated the performance of various one-class feature selection and classification methods for zero-day Android malware detection. Unlike two-class methods, one-class methods only use benign samples to build the detection model which overcomes the overfitting problem. Our results demonstrate the capability of the one-class methods over the two-class methods in detecting zero-day Android malware attacks. 
    more » « less
  4. null (Ed.)
    A promising avenue for improving the effectiveness of behavioral-based malware detectors is to leverage two-phase detection mechanisms. Existing problem in two-phase detection is that after the first phase produces borderline decision, suspicious behaviors are not well contained before the second phase completes. This paper improves CHAMELEON, a framework to realize the uncertain environment. CHAMELEON offers two environments: standard–for software identified as benign by the first phase, and uncertain–for software received borderline classification from the first phase. The uncertain environment adds obstacles to software execution through random perturbations applied probabilistically. We introduce a dynamic perturbation threshold that can target malware disproportionately more than benign software. We analyzed the effects of the uncertain environment by manually studying 113 software and 100 malware, and found that 92% malware and 10% benign software disrupted during execution. The results were then corroborated by an extended dataset (5,679 Linux malware samples) on a newer system. Finally, a careful inspection of the benign software crashes revealed some software bugs, highlighting CHAMELEON's potential as a practical complementary antimalware solution. 
    more » « less
  5. More than 6 billion smartphones available worldwide can enable governments and public health organizations to develop apps to manage global pandemics. However, hackers can take advantage of this opportunity to target the public in nefarious ways through malware disguised as pandemics-related apps. A recent analysis conducted during the COVID-19 pandemic showed that several variants of COVID-19 related malware were installed by the public from non-trusted sources. We propose the use of app permissions and an extra feature (the total number of permissions) to develop a static detector using machine learning (ML) models to enable the fast-detection of pandemics-related Android malware at installation time. Using a dataset of more than 2000 COVID-19 related apps and by evaluating ML models created using decision trees and Naive Bayes, our results show that pandemics-related malware apps can be detected with an accuracy above 90% using decision tree models with app permissions and the proposed feature. 
    more » « less