NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DEEPCAPA: Identifying Malicious Capabilities in Windows Malware

https://doi.org/10.1109/ACSAC63791.2024.00072

Vasan, Saastha; Aghakhani, Hojjat; Ortolani, Stefano; Vasilenko, Roman; Grishchenko, Ilya; Kruegel, Christopher; Vigna, Giovanni (December 2024, IEEE)

Malware detection and classification has been the focus of extensive research over many years. However, less effort has been devoted to developing post-detection systems that identify specific malicious capabilities (or behaviors) in malware. Such systems play a critical part in identifying and mitigating the damage caused by malware attacks. Unfortunately, current methods for identifying malware capabilities involve substantial manual reverse engineering efforts and context switching between multiple tools, which slows down an investigation and gives attackers an advantage. In this paper, we propose DEEPCAPA, an automated postdetection system that uses machine learning to identify potentially malicious capabilities in malware in the form of MITRE ATT&CK techniques. Our system operates on sequences of API calls, extracted from the memory snapshots taken at key points during the (sandboxed) execution of malware. Our results demonstrate that DEEPCAPA can accurately identify malicious capabilities, achieving a precision of 95.80% and a recall of 93.76% across 29 different techniques.
more » « less
Free, publicly-accessible full text available December 9, 2025
SYMBEXCEL: Automated Analysis and Understanding of Malicious Excel 4.0 Macros

https://doi.org/10.1109/SP46214.2022.9833765

Ruaro, Nicola; Pagani, Fabio; Ortolani, Stefano; Kruegel, Christopher; Vigna, Giovanni (May 2022, 2022 IEEE Symposium on Security and Privacy (SP))

Malicious software (malware) poses a significant threat to the security of our networks and users. In the ever-evolving malware landscape, Excel 4.0 Office macros (XL4) have recently become an important attack vector. These macros are often hidden within apparently legitimate documents and under several layers of obfuscation. As such, they are difficult to analyze using static analysis techniques. Moreover, the analysis in a dynamic analysis environment (a sandbox) is challenging because the macros execute correctly only under specific environmental conditions that are not always easy to create. This paper presents SYMBEXCEL, a novel solution that leverages symbolic execution to deobfuscate and analyze Excel 4.0 macros automatically. Our approach proceeds in three stages: (1) The malicious document is parsed and loaded in memory; (2) Our symbolic execution engine executes the XL4 formulas; and (3) Our Engine concretizes any symbolic values encountered during the symbolic exploration, therefore evaluating the execution of each macro under a broad range of (meaningful) environment configurations. SYMBEXCEL significantly outperforms existing deobfuscation tools, allowing us to reliably extract Indicators of Compromise (IoCs) and other critical forensics information. Our experiments demonstrate the effectiveness of our approach, especially in deobfuscating novel malicious documents that make heavy use of environment variables and are often not identified by commercial anti-virus software.
more » « less
Full Text Available
One Size Does Not Fit All: A Longitudinal Analysis of Brazilian Financial Malware

Botacin, Marcus; Aghakhani, Hojjat; Ortolani, Stefano; Kruegel, Christopher; Vigna, Giovanni; Geus, Paulo de; Oliveira, Daniela; Grégio, André (January 2021, ACM transactions on privacy and security)
Prof. Ninghui Li Editor in Chief, ACM Transactions (Ed.)
Malware analysis is an essential task to understand infection campaigns, the behavior of malicious codes, and possible ways to mitigate threats. Malware analysis also allows better assessment of attacker’s capabilities, techniques, and processes. Although a substantial amount of previous work provided a comprehensive analysis of the international malware ecosystem, research on regionalized, country, and population-specific malware campaigns have been scarce. Moving towards addressing this gap, we conducted a longitudinal (2012-2020) and comprehensive (encompassing an entire population of online banking users) study of MS Windows desktop malware that actually infected Brazilian bank’s users. We found that the Brazilian financial desktop malware has been evolving quickly: it started to make use of a variety of file formats instead of typical PE binaries, relied on native system resources, and abused obfuscation technique to bypass detection mechanisms. Our study on the threats targeting a significant population on the ecosystem of the largest and most populous country in Latin America can provide invaluable insights that may be applied to other countries’ user populations, especially those in the developing world that might face cultural peculiarities similar to Brazil’s. With this evaluation, we expect to motivate the security community/industry to seriously considering a deeper level of customization during the development of next generation anti-malware solutions, as well as to raise awareness towards regionalized and targeted Internet threats.
more » « less
Full Text Available
When Malware is Packin' Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features

https://doi.org/10.14722/ndss.2020.24310

Aghakhani, Hojjat; Gritti, Fabio; Mecca, Francesco; Lindorfer, Martina; Ortolani, Stefano; Balzarotti, Davide; Vigna, Giovanni; Kruegel, Christopher (January 2020, Network and Distributed Systems Security (NDSS) Symposium 2020)

Machine learning techniques are widely used in addition to signatures and heuristics to increase the detection rate of anti-malware software, as they automate the creation of detection models, making it possible to handle an ever-increasing number of new malware samples. In order to foil the analysis of anti-malware systems and evade detection, malware uses packing and other forms of obfuscation. However, few realize that benign applications use packing and obfuscation as well, to protect intellectual property and prevent license abuse. In this paper, we study how machine learning based on static analysis features operates on packed samples. Malware researchers have often assumed that packing would prevent machine learning techniques from building effective classifiers. However, both industry and academia have published results that show that machine-learning-based classifiers can achieve good detection rates, leading many experts to think that classifiers are simply detecting the fact that a sample is packed, as packing is more prevalent in malicious samples. We show that, different from what is commonly assumed, packers do preserve some information when packing programs that is “useful” for malware classification. However, this information does not necessarily capture the sample’s behavior. We demonstrate that the signals extracted from packed executables are not rich enough for machine-learning-based models to (1) generalize their knowledge to operate on unseen packers, and (2) be robust against adversarial examples. We also show that a na¨ıve application of machine learning techniques results in a substantial number of false positives, which, in turn, might have resulted
more » « less
Full Text Available

Search for: All records