skip to main content


This content will become publicly available on December 1, 2024

Title: Machine learning based fileless malware traffic classification using image visualization
Abstract

In today’s interconnected world, network traffic is replete with adversarial attacks. As technology evolves, these attacks are also becoming increasingly sophisticated, making them even harder to detect. Fortunately, artificial intelligence (AI) and, specifically machine learning (ML), have shown great success in fast and accurate detection, classification, and even analysis of such threats. Accordingly, there is a growing body of literature addressing how subfields of AI/ML (e.g., natural language processing (NLP)) are getting leveraged to accurately detect evasive malicious patterns in network traffic. In this paper, we delve into the current advancements in ML-based network traffic classification using image visualization. Through a rigorous experimental methodology, we first explore the process of network traffic to image conversion. Subsequently, we investigate how machine learning techniques can effectively leverage image visualization to accurately classify evasive malicious traces within network traffic. Through the utilization of production-level tools and utilities in realistic experiments, our proposed solution achieves an impressive accuracy rate of 99.48% in detecting fileless malware, which is widely regarded as one of the most elusive classes of malicious software.

 
more » « less
Award ID(s):
2200538 2113945 2114936
NSF-PAR ID:
10540422
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Cybersecurity
Volume:
6
Issue:
1
ISSN:
2523-3246
Page Range / eLocation ID:
32
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The Internet has become a vital part of our daily lives, serving as a hub for global connectivity and a facilitator for seamless communication and information exchange. However, the rise of malicious domains presents a serious challenge, undermining the reliability of the Internet and posing risks to user safety. These malicious activities exploit the Domain Name System (DNS) to deceive users, leading to harmful activities such as spreading drive-by-download malware, operating botnets, creating phishing sites, and sending spam. In response to this growing threat, the application of Machine Learning (ML) techniques has proven to be highly effective. These methods excel in quickly and accurately detecting, classifying, and analyzing such threats. This paper explores the latest developments in using transfer learning for the classification of malicious domains, with a focus on image visualization as a key methodological approach. Our proposed solution has achieved a remarkable testing accuracy rate of 98.67%, demonstrating its effectiveness in detecting and classifying malicious domains.

     
    more » « less
  2. null (Ed.)
    Recent self-propagating malware (SPM) campaigns compromised hundred of thousands of victim machines on the Internet. It is challenging to detect these attacks in their early stages, as adversaries utilize common network services, use novel techniques, and can evade existing detection mechanisms. We propose PORTFILER (PORT-Level Network Traffic ProFILER), a new machine learning system applied to network traffic for detecting SPM attacks. PORTFILER extracts port-level features from the Zeek connection logs collected at a border of a monitored network, applies anomaly detection techniques to identify suspicious events, and ranks the alerts across ports for investigation by the Security Operations Center (SOC). We propose a novel ensemble methodology for aggregating individual models in PORTFILER that increases resilience against several evasion strategies compared to standard ML baselines. We extensively evaluate PORTFILER on traffic collected from two university networks, and show that it can detect SPM attacks with different patterns, such as WannaCry and Mirai, and performs well under evasion. Ranking across ports achieves precision over 0.94 and false positive rates below 8 × 10−4 in the top 100 highly ranked alerts. When deployed on the university networks, PORTFILER detected anomalous SPM-like activity on one of the campus networks, confirmed by the university SOC as malicious. PORTFILER also detected a Mirai attack recreated on the two university networks with higher precision and recall than deep learning based autoencoder methods. 
    more » « less
  3. Wysocki, Bryant T. ; Holt, James ; Blowers, Misty (Ed.)
    Ever since human society entered the age of social media, every user has had a considerable amount of visual content stored online and shared in variant virtual communities. As an efficient information circulation measure, disastrous consequences are possible if the contents of images are tampered with by malicious actors. Specifically, we are witnessing the rapid development of machine learning (ML) based tools like DeepFake apps. They are capable of exploiting images on social media platforms to mimic a potential victim without their knowledge or consent. These content manipulation attacks can lead to the rapid spread of misinformation that may not only mislead friends or family members but also has the potential to cause chaos in public domains. Therefore, robust image authentication is critical to detect and filter off manipulated images. In this paper, we introduce a system that accurately AUthenticates SOcial MEdia images (AUSOME) uploaded to online platforms leveraging spectral analysis and ML. Images from DALL-E 2 are compared with genuine images from the Stanford image dataset. Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) are used to perform a spectral comparison. Additionally, based on the differences in their frequency response, an ML model is proposed to classify social media images as genuine or AI-generated. Using real-world scenarios, the AUSOME system is evaluated on its detection accuracy. The experimental results are encouraging and they verified the potential of the AUSOME scheme in social media image authentications. 
    more » « less
  4. The detection of zero-day attacks and vulnerabilities is a challenging problem. It is of utmost importance for network administrators to identify them with high accuracy. The higher the accuracy is, the more robust the defense mechanism will be. In an ideal scenario (i.e., 100% accuracy) the system can detect zero-day malware without being concerned about mistakenly tagging benign files as malware or enabling disruptive malicious code running as none-malicious ones. This paper investigates different machine learning algorithms to find out how well they can detect zero-day malware. Through the examination of 34 machine/deep learning classifiers, we found that the random forest classifier offered the best accuracy. The paper poses several research questions regarding the performance of machine and deep learning algorithms when detecting zero-day malware with zero rates for false positive and false negative. 
    more » « less
  5. Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user’s computer and the secure network. In this work we investigate these attacks under a different attack model, in which the adversary is capable of sending a small amount of malicious JavaScript code to the target user’s computer. The malicious code mounts a cache side-channel attack, which exploits the effects of contention on the CPU’s cache, to identify other websites being browsed. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites. We show that cache website fingerprinting attacks in JavaScript are highly feasible. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last-level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed-world models. We further show that our attack is more resistant than network-based fingerprinting to the effects of response caching, and that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack. To protect against cache-based website fingerprinting, new defense mechanisms must be introduced to privacy-sensitive browsers and websites. We investigate one such mechanism, and show that generating artificial cache activity reduces the effectiveness of the attack and completely eliminates it when used in the Tor Browser 
    more » « less