skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sometimes, You Aren’t What You Do: Mimicry Attacks against Provenance Graph Host Intrusion Detection Systems
Reliable methods for host-layer intrusion detection remained an open problem within computer security. Recent research has recast intrusion detection as a provenance graph anomaly detection problem thanks to concurrent advancements in machine learning and causal graph auditing. While these approaches show promise, their robustness against an adaptive adversary has yet to be proven. In particular, it is unclear if mimicry attacks, which plagued past approaches to host intrusion detection, have a similar effect on modern graph-based methods. In this work, we reveal that systematic design choices have allowed mimicry attacks to continue to abound in provenance graph host intrusion detection systems (Prov-HIDS). Against a corpus of exemplar Prov-HIDS, we develop evasion tactics that allow attackers to hide within benign process behaviors. Evaluating against public datasets, we demonstrate that an attacker can consistently evade detection (100% success rate) without modifying the underlying attack behaviors. We go on to show that our approach is feasible in live attack scenarios and outperforms domain-general adversarial sample techniques. Through open sourcing our code and datasets, this work will serve as a benchmark for the evaluation of future Prov-HIDS.  more » « less
Award ID(s):
2055127 1750024
PAR ID:
10412012
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
30th Network and Distributed System Security Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Endpoint threat detection research hinges on the availability of worthwhile evaluation benchmarks, but experimenters' understanding of the contents of benchmark datasets is often limited. Typically, attention is only paid to the realism of attack behaviors, which comprises only a small percentage of the audit logs in the dataset, while other characteristics of the data are inscrutable and unknown. We propose a new set of questions for what to talk about when we talk about logs (i.e., datasets): What activities are in the dataset? We introduce a novel visualization that succinctly represents the totality of 100+ GB datasets by plotting the occurrence of provenance graph neighborhoods in a time series. How synthetic is the background activity? We perform autocorrelation analysis of provenance neighborhoods in the training split to identify process behaviors that occur at predictable intervals in the test split. Finally, How conspicuous is the malicious activity? We quantify the proportion of attack behaviors that are observed as benign neighborhoods in the training split as compared to previously-unseen attack neighborhoods. We then validate these questions by profiling the classification performance of state-of-the-art intrusion detection systems (R-CAID, FLASH, KAIROS, GNN) against a battery of public benchmark datasets (DARPA Transparent Computing and OpTC, ATLAS, ATLASv2). We demonstrate that synthetic background activities dramatically inflate True Negative Rates, while conspicuous malicious activities artificially boost True Positive Rates. Further, by explicitly controlling for these factors, we provide a more holistic picture of classifier performance. This work will elevate the dialogue surrounding threat detection datasets and will increase the rigor of threat detection experiments. 
    more » « less
  2. The deployment of deep learning-based malware detection systems has transformed cybersecurity, offering sophisticated pattern recognition capabilities that surpass traditional signature-based approaches. However, these systems introduce new vulnerabilities requiring systematic investigation. This chapter examines adversarial attacks against graph neural network-based malware detection systems, focusing on semantics-preserving methodologies that evade detection while maintaining program functionality. We introduce a reinforcement learning (RL) framework that formulates the attack as a sequential decision making problem, optimizing the insertion of no-operation (NOP) instructions to manipulate graph structure without altering program behavior. Comparative analysis includes three baseline methods: random insertion, hill-climbing, and gradient-approximation attacks. Our experimental evaluation on real world malware datasets reveals significant differences in effectiveness, with the reinforcement learning approach achieving perfect evasion rates against both Graph Convolutional Network and Deep Graph Convolutional Neural Network architectures while requiring minimal program modifications. Our findings reveal three critical research gaps: transitioning from abstract Control Flow Graph representations to executable binary manipulation, developing universal vulnerability discovery across different architectures, and systematically translating adversarial insights into defensive enhancements. This work contributes to understanding adversarial vulnerabilities in graph-based security systems while establishing frameworks for evaluating machine learning-based malware detection robustness. 
    more » « less
  3. Auditing, a central pillar of operating system security, has only recently come into its own as an active area of public research. This resurgent interest is due in large part to the notion of data provenance, a technique that iteratively parses audit log entries into a dependency graph that explains the history of system execution. Provenance facilitates precise threat detection and investigation through causal analysis of sophisticated intrusion behaviors. However, the absence of a foundational audit literature, combined with the rapid publication of recent findings, makes it difficult to gain a holistic picture of advancements and open challenges in the area.In this work, we survey and categorize the provenance-based system auditing literature, distilling contributions into a layered taxonomy based on the audit log capture and analysis pipeline. Recognizing that the Reduction Layer remains a key obstacle to the further proliferation of causal analysis technologies, we delve further on this issue by conducting an ambitious independent evaluation of 8 exemplar reduction techniques against the recently-released DARPA Transparent Computing datasets. Our experiments uncover that past approaches frequently prune an overlapping set of activities from audit logs, reducing the synergistic benefits from applying them in tandem; further, we observe an inverse relation between storage efficiency and anomaly detection performance. However, we also observe that log reduction techniques are able to synergize effectively with data compression, potentially reducing log retention costs by multiple orders of magnitude. We conclude by discussing promising future directions for the field. 
    more » « less
  4. Graph signal processing (GSP) has emerged as a powerful tool for practical network applications, including power system monitoring. Recent research has focused on developing GSP-based methods for state estimation, attack detection, and topology identification using the representation of the power system voltages as smooth graph signals. Within this framework, efficient methods have been developed for detecting false data injection (FDI) attacks, which until now were perceived as nonsmooth with respect to the graph Laplacian matrix. Consequently, these methods may not be effective against smooth FDI attacks. In this paper, we propose a graph FDI (GFDI) attack that minimizes the Laplacian-based graph total variation (TV) under practical constraints. We present the GFDI attack as the solution for a non-convex constrained optimization problem. The solution to the GFDI attack problem is obtained through approximating it using ℓ1 relaxation. A series of quadratic programming problems that are classified as convex optimization problems are solved to obtain the final solution. We then propose a protection scheme that identifies the minimal set of measurements necessary to constrain the GFDI output to a high graph TV, thereby enabling its detection by existing GSP-based detectors. Our numerical simulations on the IEEE-57 and IEEE-118 bus test cases reveal the potential threat posed by well-designed GSP-based FDI attacks. Moreover, we demonstrate that integrating the proposed protection design with GSP-based detection can lead to significant hardware cost savings compared to previous designs of protection methods against FDI attacks. 
    more » « less
  5. Machine learning-based security detection models have become prevalent in modern malware and intrusion detection systems. However, previous studies show that such models are susceptible to adversarial evasion attacks. In this type of attack, inputs (i.e., adversarial examples) are specially crafted by intelligent malicious adversaries, with the aim of being misclassified by existing state-of-the-art models (e.g., deep neural networks). Once the attackers can fool a classifier to think that a malicious input is actually benign, they can render a machine learning-based malware or intrusion detection system ineffective. Objective To help security practitioners and researchers build a more robust model against non-adaptive, white-box and non-targeted adversarial evasion attacks through the idea of ensemble model. Method We propose an approach called Omni, the main idea of which is to explore methods that create an ensemble of “unexpected models”; i.e., models whose control hyperparameters have a large distance to the hyperparameters of an adversary’s target model, with which we then make an optimized weighted ensemble prediction. Results In studies with five types of adversarial evasion attacks (FGSM, BIM, JSMA, DeepFool and Carlini-Wagner) on five security datasets (NSL-KDD, CIC-IDS-2017, CSE-CIC-IDS2018, CICAndMal2017 and the Contagio PDF dataset), we show Omni is a promising approach as a defense strategy against adversarial attacks when compared with other baseline treatments Conclusions When employing ensemble defense against adversarial evasion attacks, we suggest to create ensemble with unexpected models that are distant from the attacker’s expected model (i.e., target model) through methods such as hyperparameter optimization. 
    more » « less