skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Can Machine/Deep Learning Classifiers Detect Zero-Day Malware with High Accuracy?
The detection of zero-day attacks and vulnerabilities is a challenging problem. It is of utmost importance for network administrators to identify them with high accuracy. The higher the accuracy is, the more robust the defense mechanism will be. In an ideal scenario (i.e., 100% accuracy) the system can detect zero-day malware without being concerned about mistakenly tagging benign files as malware or enabling disruptive malicious code running as none-malicious ones. This paper investigates different machine learning algorithms to find out how well they can detect zero-day malware. Through the examination of 34 machine/deep learning classifiers, we found that the random forest classifier offered the best accuracy. The paper poses several research questions regarding the performance of machine and deep learning algorithms when detecting zero-day malware with zero rates for false positive and false negative.  more » « less
Award ID(s):
1821560 1723765
PAR ID:
10186605
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2019 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
3252 to 3259
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ubiquitous deployment of robots across diverse domains, from industrial automation to personal care, underscores their critical role in modern society. However, this growing dependence has also revealed security vulnerabilities. An attack vector involves the deployment of malicious software (malware) on robots, which can cause harm to robots themselves, users, and even the surrounding environment. Machine learning approaches, particularly supervised ones, have shown promise in malware detection by building intricate models to identify known malicious code patterns. However, these methods are inherently limited in detecting unseen or zero-day malware variants as they require regularly updated massive datasets that might be unavailable to robots. To address this challenge, we introduce ROBOGUARDZ, a novel malware detection framework based on zero-shot learning for robots. This approach allows ROBOGUARDZ to identify unseen malware by establishing relationships between known malicious code and benign behaviors, allowing detection even before the code executes on the robot. To ensure practical deployment in resource-constrained robotic hardware, we employ a unique parallel structured pruning and quantization strategy that compresses the ROBOGUARDZ detection model by 37.4% while maintaining its accuracy. This strategy reduces the size of the model and computational demands, making it suitable for real-world robotic systems. We evaluated ROBOGUARDZ on a recent dataset containing real-world binary executables from multi-sensor autonomous car controllers. The framework was deployed on two popular robot embedded hardware platforms. Our results demonstrate an average detection accuracy of 94.25% and a low false negative rate of 5.8% with a minimal latency of 20 ms, which demonstrates its effectiveness and practicality. 
    more » « less
  2. Android, the most dominant Operating System (OS), experiences immense popularity for smart devices for the last few years. Due to its' popularity and open characteristics, Android OS is becoming the tempting target of malicious apps which can cause serious security threat to financial institutions, businesses, and individuals. Traditional anti-malware systems do not suffice to combat newly created sophisticated malware. Hence, there is an increasing need for automatic malware detection solutions to reduce the risks of malicious activities. In recent years, machine learning algorithms have been showing promising results in classifying malware where most of the methods are shallow learners like Logistic Regression (LR). In this paper, we propose a deep learning framework, called Droid-NNet, for malware classification. However, our proposed method Droid-NNet is a deep learner that outperforms existing cutting-edge machine learning methods. We performed all the experiments on two datasets (Malgenome-215 & Drebin-215) of Android apps to evaluate Droid-NNet. The experimental result shows the robustness and effectiveness of Droid-NNet. 
    more » « less
  3. With the rapid technological advancement, security has become a major issue due to the increase in malware activity that poses a serious threat to the security and safety of both computer systems and stakeholders. To maintain stakeholder’s, particularly, end user’s security, protecting the data from fraudulent efforts is one of the most pressing concerns. A set of malicious programming code, scripts, active content, or intrusive software that is designed to destroy intended computer systems and programs or mobile and web applications is referred to as malware. According to a study, naive users are unable to distinguish between malicious and benign applications. Thus, computer systems and mobile applications should be designed to detect malicious activities towards protecting the stakeholders. A number of algorithms are available to detect malware activities by utilizing novel concepts including Artificial Intelligence, Machine Learning, and Deep Learning. In this study, we emphasize Artificial Intelligence (AI) based techniques for detecting and preventing malware activity. We present a detailed review of current malware detection technologies, their shortcomings, and ways to improve efficiency. Our study shows that adopting futuristic approaches for the development of malware detection applications shall provide significant advantages. The comprehension of this synthesis shall help researchers for further research on malware detection and prevention using AI. 
    more » « less
  4. Ko, Hanseok (Ed.)
    Malware represents a significant security concern in today’s digital landscape, as it can destroy or disable operating systems, steal sensitive user information, and occupy valuable disk space. However, current malware detection methods, such as static-based and dynamic-based approaches, struggle to identify newly developed ("zero-day") malware and are limited by customized virtual machine (VM) environments. To overcome these limitations, we propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. The generated network topologies are input into the GraphSAGE architecture to efficiently distinguish between benign and malicious software applications, with the operation names denoted as node features. Importantly, the GraphSAGE models analyze the network’s topological geometry to make predictions, enabling them to detect state-of-the-art malware and prevent potential damage during execution in a VM. To evaluate our approach, we conduct a study on a dataset comprising source code from 24,376 applications, specifically written in C/C++, sourced directly from widely-recognized malware and various types of benign software. The results show a high detection performance with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 99.85%. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution when compared to current state-of-the-art malware detection methods. The code is released at https://github.com/HantangZhang/MGN. 
    more » « less
  5. null (Ed.)
    Machine learning algorithms used to detect attacks are limited by the fact that they cannot incorporate the back-ground knowledge that an analyst has. This limits their suitability in detecting new attacks. Reinforcement learning is different from traditional machine learning algorithms used in the cybersecurity domain. Compared to traditional ML algorithms, reinforcement learning does not need a mapping of the input-output space or a specific user-defined metric to compare data points. This is important for the cybersecurity domain, especially for malware detection and mitigation, as not all problems have a single, known, correct answer. Often, security researchers have to resort to guided trial and error to understand the presence of a malware and mitigate it.In this paper, we incorporate prior knowledge, represented as Cybersecurity Knowledge Graphs (CKGs), to guide the exploration of an RL algorithm to detect malware. CKGs capture semantic relationships between cyber-entities, including that mined from open source. Instead of trying out random guesses and observing the change in the environment, we aim to take the help of verified knowledge about cyber-attack to guide our reinforcement learning algorithm to effectively identify ways to detect the presence of malicious filenames so that they can be deleted to mitigate a cyber-attack. We show that such a guided system outperforms a base RL system in detecting malware. 
    more » « less