NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

R-CAID: Embedding Root Cause Analysis within Provenance-based Intrusion Detection

Goyal, Akul; Wang, Gang; Bates, Adam (May 2024, Proceedings of The 45th IEEE Symposium on Security and Privacy (IEEE SP))

Full Text Available
What We Talk About When We Talk About Logs: Understanding the Effects of Dataset Quality on Endpoint Threat Detection Research

https://doi.org/10.1109/SP61157.2025.00112

Liu, Jason; Inam, Muhammad Adil; Goyal, Akul; Riddle, Andy; Westfall, Kim; Bates, Adam (May 2025, IEEE Symposium on Security and Privacy)

Endpoint threat detection research hinges on the availability of worthwhile evaluation benchmarks, but experimenters' understanding of the contents of benchmark datasets is often limited. Typically, attention is only paid to the realism of attack behaviors, which comprises only a small percentage of the audit logs in the dataset, while other characteristics of the data are inscrutable and unknown. We propose a new set of questions for what to talk about when we talk about logs (i.e., datasets): What activities are in the dataset? We introduce a novel visualization that succinctly represents the totality of 100+ GB datasets by plotting the occurrence of provenance graph neighborhoods in a time series. How synthetic is the background activity? We perform autocorrelation analysis of provenance neighborhoods in the training split to identify process behaviors that occur at predictable intervals in the test split. Finally, How conspicuous is the malicious activity? We quantify the proportion of attack behaviors that are observed as benign neighborhoods in the training split as compared to previously-unseen attack neighborhoods. We then validate these questions by profiling the classification performance of state-of-the-art intrusion detection systems (R-CAID, FLASH, KAIROS, GNN) against a battery of public benchmark datasets (DARPA Transparent Computing and OpTC, ATLAS, ATLASv2). We demonstrate that synthetic background activities dramatically inflate True Negative Rates, while conspicuous malicious activities artificially boost True Positive Rates. Further, by explicitly controlling for these factors, we provide a more holistic picture of classifier performance. This work will elevate the dialogue surrounding threat detection datasets and will increase the rigor of threat detection experiments.
more » « less
Free, publicly-accessible full text available May 12, 2026
Sometimes, You Aren’t What You Do: Mimicry Attacks against Provenance Graph Host Intrusion Detection Systems

Goyal, Akul; Han, Xueyuan; Wang, Gang; Bates, Adam (February 2023, 30th Network and Distributed System Security Symposium)

Reliable methods for host-layer intrusion detection remained an open problem within computer security. Recent research has recast intrusion detection as a provenance graph anomaly detection problem thanks to concurrent advancements in machine learning and causal graph auditing. While these approaches show promise, their robustness against an adaptive adversary has yet to be proven. In particular, it is unclear if mimicry attacks, which plagued past approaches to host intrusion detection, have a similar effect on modern graph-based methods. In this work, we reveal that systematic design choices have allowed mimicry attacks to continue to abound in provenance graph host intrusion detection systems (Prov-HIDS). Against a corpus of exemplar Prov-HIDS, we develop evasion tactics that allow attackers to hide within benign process behaviors. Evaluating against public datasets, we demonstrate that an attacker can consistently evade detection (100% success rate) without modifying the underlying attack behaviors. We go on to show that our approach is feasible in live attack scenarios and outperforms domain-general adversarial sample techniques. Through open sourcing our code and datasets, this work will serve as a benchmark for the evaluation of future Prov-HIDS.
more » « less
Full Text Available
Sometimes, You Aren't What You Do: Mimicry Attacks against Provenance Graph Host Intrusion Detection Systems

https://doi.org/10.14722/ndss.2023.24207

Goyal, Akul; Han, Xueyuan; Wang, Gang; Bates, Adam (January 2023, Internet Society)
SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions

https://doi.org/10.1109/SP46215.2023.10179405

Inam, Muhammad Adil; Chen, Yinfang; Goyal, Akul; Liu, Jason; Mink, Jaron; Michael, Noor; Gaur, Sneha; Bates, Adam; Hassan, Wajih Ul (May 2023, 2023 IEEE Symposium on Security and Privacy (SP))

Auditing, a central pillar of operating system security, has only recently come into its own as an active area of public research. This resurgent interest is due in large part to the notion of data provenance, a technique that iteratively parses audit log entries into a dependency graph that explains the history of system execution. Provenance facilitates precise threat detection and investigation through causal analysis of sophisticated intrusion behaviors. However, the absence of a foundational audit literature, combined with the rapid publication of recent findings, makes it difficult to gain a holistic picture of advancements and open challenges in the area.In this work, we survey and categorize the provenance-based system auditing literature, distilling contributions into a layered taxonomy based on the audit log capture and analysis pipeline. Recognizing that the Reduction Layer remains a key obstacle to the further proliferation of causal analysis technologies, we delve further on this issue by conducting an ambitious independent evaluation of 8 exemplar reduction techniques against the recently-released DARPA Transparent Computing datasets. Our experiments uncover that past approaches frequently prune an overlapping set of activities from audit logs, reducing the synergistic benefits from applying them in tandem; further, we observe an inverse relation between storage efficiency and anomaly detection performance. However, we also observe that log reduction techniques are able to synergize effectively with data compression, potentially reducing log retention costs by multiple orders of magnitude. We conclude by discussing promising future directions for the field.
more » « less
Full Text Available
FAuST: Striking a Bargain between Forensic Auditing’s Security and Throughput

https://doi.org/10.1145/3564625.3567990

Inam, Muhammad Adil; Goyal, Akul; Liu, Jason; Mink, Jaron; Michael, Noor; Gaur, Sneha; Bates, Adam; Hassan, Wajih Ul (December 2022, 38th Annual Computer Security Applications Conference)

System logs are invaluable to forensic audits, but grow so large that in practice fine-grained logs are quickly discarded – if captured at all – preventing the real-world use of the provenance-based investigation techniques that have gained popularity in the literature. Encouragingly, forensically-informed methods for reducing the size of system logs are a subject of frequent study. Unfortunately, many of these techniques are designed for offline reduction in a central server, meaning that the up-front cost of log capture, storage, and transmission must still be paid at the endpoints. Moreover, to date these techniques exist as isolated (and, often, closed-source) implementations; there does not exist a comprehensive framework through which the combined benefits of multiple log reduction techniques can be enjoyed. In this work, we present FAuST, an audit daemon for performing streaming audit log reduction at system endpoints. After registering with a log source (e.g., via Linux Audit’s audisp utility), FAuST incrementally builds an in-memory provenance graph of recent system activity. During graph construction, log reduction techniques that can be applied to local subgraphs are invoked immediately using event callback handlers, while techniques meant for application on the global graph are invoked in periodic epochs. We evaluate FAuST, loaded with eight different log reduction modules from the literature, against the DARPA Transparent Computing datasets. Our experiments demonstrate the efficient performance of FAuST and identify certain subsets of reduction techniques that are synergistic with one another. Thus, FAuST dramatically simplifies the evaluation and deployment of log reduction techniques.
more » « less
Full Text Available

Search for: All records