More than 6 billion smartphones available worldwide can enable governments and public health organizations to develop apps to manage global pandemics. However, hackers can take advantage of this opportunity to target the public in nefarious ways through malware disguised as pandemics-related apps. A recent analysis conducted during the COVID-19 pandemic showed that several variants of COVID-19 related malware were installed by the public from non-trusted sources. We propose the use of app permissions and an extra feature (the total number of permissions) to develop a static detector using machine learning (ML) models to enable the fast-detection of pandemics-related Android malware at installation time. Using a dataset of more than 2000 COVID-19 related apps and by evaluating ML models created using decision trees and Naive Bayes, our results show that pandemics-related malware apps can be detected with an accuracy above 90% using decision tree models with app permissions and the proposed feature.
more »
« less
Comparing Classifiers: A Look at Machine-Learning and the Detection of Mobile Malware in COVID-19 Android Mobile Applications
The COVID-19 pandemic was a catalyst for many different trends in our daily life worldwide. While there has been an overall rise in cybercrime during this time, there has been relatively little research done about malicious COVID-19 themed AndroidOS applications. With the rise in reports of users falling victim to malicious COVID-19 themed AndroidOS applications, there is a need to learn about the detection of malware for pandemics-themed mobile apps.. In this project, we extracted the permissions requests from 1959 APK files from a dataset containing benign and malware COVID-19 themed apps. We then created and compared eight unique models of four varying classifiers to determine their ability to identify potentially malicious APK files based on the permissions the APK file requests: support vector machine, neural network, decision trees, and categorical naive bayes. These classifiers were then trained using Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset due to the lack of samples of malware compared to non-malware APKs. Finally, we evaluated the models using K-Fold Cross-Validation and found the decision tree classifier to be the best performing classifier.
more »
« less
- Award ID(s):
- 2308741
- PAR ID:
- 10484381
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
- ISBN:
- 9781450399265
- Page Range / eLocation ID:
- 498 to 503
- Format(s):
- Medium: X
- Location:
- Washington DC USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Today’s mobile apps employ third-party advertising and tracking (A&T) libraries, which may pose a threat to privacy. State-of-the-art detects and blocks outgoing A&T HTTP/S requests by using manually curated filter lists (e.g. EasyList), and recently, using machine learning approaches. The major bottleneck of both filter lists and classifiers is that they rely on experts and the community to inspect traffic and manually create filter list rules that can then be used to block traffic or label ground truth datasets. We propose NoMoATS – a system that removes this bottleneck by reducing the daunting task of manually creating filter rules, to the much easier and scalable task of labeling A&T libraries. Our system leverages stack trace analysis to automatically label which network requests are generated by A&T libraries. Using NoMoATS, we collect and label a new mobile traffic dataset. We use this dataset to train decision tree classifiers, which can be applied in real-time on the mobile device and achieve an average F-score of 93%. We show that both our automatic labeling and our classifiers discover thousands of requests destined to hundreds of different hosts, previously undetected by popular filter lists. To the best of our knowledge, our system is the first to (1) automatically label which mobile network requests are engaged in A&T, while requiring to only manually label libraries to their purpose and (2) apply on-device machine learning classifiers that operate at the granularity of URLs, can inspect connections across all apps, and detect not only ads, but also tracking.more » « less
-
The phenomenal growth in use of android devices in the recent years has also been accompanied by the rise of android malware. This reality warrants development of tools and techniques to analyze android apps in large scale for security vetting. Most of the state-of-the-art vetting tools are either based on static analysis or on dynamic analysis. Static analysis has limited success if the malware app utilizes sophisticated evading tricks. Dynamic analysis on the other hand may not find all the code execution paths, which let some malware apps remain undetected. Moreover, the existing static and dynamic analysis vetting techniques require extensive human interaction. To ad- dress the above issues, we design a deep learning based hybrid analysis technique, which combines the complementary strengths of each analysis paradigm to attain better accuracy. Moreover, automated feature engineering capability of the deep learning framework addresses the human interaction issue. In particular, using lightweight static and dynamic analysis procedure, we obtain multiple artifacts, and with these artifacts we train the deep learner to create independent models, and then combine them to build a hybrid classifier to obtain the final vetting decision (malicious apps vs. benign apps). The experiments show that our best deep learning model with hybrid analysis achieves an area under the precision-recall curve (AUC) of 0.9998. In this paper, we also present a comparative study of performance measures of the various variants of the deep learning framework. Additional experiments indicate that our vetting system is fairly robust against imbalanced data and is scalable.more » « less
-
The detection of zero-day attacks and vulnerabilities is a challenging problem. It is of utmost importance for network administrators to identify them with high accuracy. The higher the accuracy is, the more robust the defense mechanism will be. In an ideal scenario (i.e., 100% accuracy) the system can detect zero-day malware without being concerned about mistakenly tagging benign files as malware or enabling disruptive malicious code running as none-malicious ones. This paper investigates different machine learning algorithms to find out how well they can detect zero-day malware. Through the examination of 34 machine/deep learning classifiers, we found that the random forest classifier offered the best accuracy. The paper poses several research questions regarding the performance of machine and deep learning algorithms when detecting zero-day malware with zero rates for false positive and false negative.more » « less
-
As cyberattacks caused by malware have proliferated during the pandemic, building an automatic system to detect COVID-19 themed malware in social coding platforms is in urgent need. The existing methods mainly rely on file content analysis while ignoring structured information among entities in social coding platforms. Additionally, they usually require sufficient data for model training, impairing their performances over cases with limited data which is common in reality. To address these challenges, we develop Meta-AHIN, a novel model for COVID-19 themed malicious repository detection in GitHub. In Meta-AHIN, we first construct an attributed heterogeneous information network (AHIN) to model the code content and social coding properties in GitHub; and then we exploit attention-based graph convolutional neural network (AGCN) to learn repository embeddings and present a meta-learning framework for model optimization. To utilize unlabeled information in AHIN and to consider task influence of different types of repositories, we further incorporate node attribute-based self-supervised module and task-aware attention weight into AGCN and meta-learning respectively. Extensive experiments on the collected data from GitHub demonstrate that Meta-AHIN outperforms state-of-the-art methods.more » « less