Thanks to the numerous machine learning based malware detection (MLMD) research in recent years and the readily available online malware scanning system (e.g., VirusTotal), it becomes relatively easy to build a seemingly successful MLMD system using the following standard procedure: first prepare a set of ground truth data by checking with VirusTotal, then extract features from training dataset and build a machine learning detection model, and finally evaluate the model with a disjoint testing dataset. We argue that such evaluation methods do not expose the real utility of ML based malware detection in practice since the ML model is both built and tested on malware that are known at the time of training. The user could simply run them through VirusTotal just as how the researchers obtained the ground truth, instead of using the more sophisticated ML approach. However, ML based malware detection has the potential of identifying malware that has not been known at the time of training, which is the real value ML brings to this problem. We present experimentation study on how well a machine learning based malware detection system can achieve this. Our experiments showed that MLMD can consistently generate previously unknown malware knowledge, e.g., malware that is not detectable by existing malware detection systems at MLMD’s training time. Our research illustrates an ideal usage scenario for MLMD systems and demonstrates that such systems can benefit malware detection in practice. For example, by utilizing the new signals provided by the MLMD system and the detection capability of existing malware detection systems, we can more quickly uncover new malware variants or families.
more »
« less
Malrec: Compact Full-Trace Malware Recording for Retrospective Deep Analysis
Malware sandbox systems have become a critical part of the Internet’s defensive infrastructure. These systems allow malware researchers to quickly understand a sample’s behavior and effect on a system. However, current systems face two limitations: first, for performance reasons, the amount of data they can collect is limited (typically to system call traces and memory snapshots). Second, they lack the ability to perform retrospective analysis—that is, to later extract features of the malware’s execution that were not considered relevant when the sample was originally executed. In this paper, we introduce a new malware sandbox system, Malrec, which uses whole-system deterministic record and replay to capture high-fidelity, whole-system traces of malware executions with low time and space overheads. We demonstrate the usefulness of this system by presenting a new dataset of 66,301 malware recordings collected over a two-year period, along with two preliminary analyses that would not be possible without full traces: an analysis of kernel mode malware and exploits, and a fine-grained malware family classification based on textual memory access contents. The Malrec system and dataset can help provide a standardized benchmark for evaluating the performance of future dynamic analyses.
more »
« less
- Award ID(s):
- 1657199
- NSF-PAR ID:
- 10084747
- Date Published:
- Journal Name:
- Detection of Intrusions and Malware, and Vulnerability Assessment
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Traditionally, Android malware is analyzed using static or dynamic analysis. Although static techniques are often fast; however, they cannot be applied to classify obfuscated samples or malware with a dynamic payload. In comparison, the dynamic approach can examine obfuscated variants but often incurs significant runtime overhead when collecting every important malware behavioral data. This paper conducts an exploratory analysis of memory forensics as an alternative technique for extracting feature vectors for an Android malware classifier. We utilized the reconstructed per-process object allocation network to identify distinguishable patterns in malware and benign application. Our evaluation results indicate the network structural features in the malware category are unique compared to the benign dataset, and thus features extracted from the remnant of in-memory allocated objects can be utilized for robust Android malware classification algorithm.more » « less
-
Today, larger memory capacity and higher memory bandwidth are required for better performance and energy efficiency for many important client and datacenter applications. Hardware memory compression provides a promising direction to achieve this without increasing system cost. Unfortunately, current memory compression solutions face two significant challenges. First, keeping memory compressed requires additional memory accesses, sometimes on the critical path, which can cause performance overheads. Second, they require changing the operating system to take advantage of the increased capacity, and to handle incompressible data, which delays deployment. We propose Compresso, a hardware memory compression architecture that minimizes memory overheads due to compression, with no changes to the OS. We identify new data-movement trade-offs and propose optimizations that reduce additional memory movement to improve system efficiency. We propose a holistic evaluation for compressed systems. Our results show that Compresso achieves a 1.85x compression for main memory on average, with a 24% speedup over a competitive hardware compressed system for single-core systems and 27% for multi-core systems. As compared to competitive compressed systems, Compresso not only reduces performance overhead of compression, but also increases performance gain from higher memory capacity.more » « less
-
Combating the OS-level malware is a very challenging problem as this type of malware can compromise the operating system, obtaining the kernel privilege and subverting almost all the existing anti-malware tools. This work aims to address this problem in the context of mobile devices. As real-world malware is very heterogeneous, we narrow down the scope of our work by especially focusing on a special type of OS-level malware that always corrupts user data. We have designed mobiDOM, the first framework that can combat the OS-level data corruption malware for mobile computing devices. Our mobiDOM contains two components, a malware detector and a data repairer. The malware detector can securely and timely detect the presence of OS-level malware by fully utilizing the existing hardware features of a mobile device, namely, flash memory and Arm TrustZone. Specifically, we integrate the malware detection into the flash translation layer (FTL), a firmware layer embedded into the flash storage hardware, which is inaccessible to the OS; in addition, we run a trusted application in the Arm TrustZone secure world, which acts as a user-level manager of the malware detector. The FTL-based malware detection and the TrustZone-based manager can communicate with each other stealthily via steganography. The data repairer can allow restoring the external storage to a healthy historical state by taking advantage of the out-of-place-update feature of flash memory and our malware-aware garbage collection in the FTL. Security analysis and experimental evaluation on a real-world testbed confirm the effectiveness of mobiDOM.more » « less
-
null (Ed.)Malicious software, popularly known as malware, is a serious threat to modern computing systems. A comprehensive cybercrime study by Ponemon Institute highlights that malware is the most expensive attack for organizations, with an average revenue loss of $2.6 million per organization in 2018 (11% increase compared to 2017). Recent high-profile malware attacks coupled with serious economic implications have dramatically changed our perception of threat from malware. Software-based solutions, such as anti-virus programs, are not effective since they rely on matching patterns (signatures) that can be easily fooled by carefully crafted malware with obfuscation or other deviation capabilities. Moreover, software-based solutions are not fast enough for real-time malware detection in safety-critical systems. In this paper, we investigate promising approaches for hardware-assisted malware detection using machine learning. Specifically, we explore how machine learning can be effective for malware detection utilizing hardware performance counters, embedded trace buffer as well as on-chip network traffic analysis.more » « less