skip to main content


Title: Tactical Provenance Analysis for Endpoint Detection and Response Systems
Recent advances in causality analysis have enabled investigators to trace multi-stage attacks using provenance graphs. Based on system-layer audit logs (e.g., syscalls), these approaches omit vital sources of application context (e.g., email addresses, HTTP response codes) that can be found in higher layers of the system. Although such information is often essential to understanding attack behaviors, it is difficult to incorporate this evidence into causal analysis engines because of the semantic gap that exists between system layers. To address that shortcoming, we propose the notion of universal provenance, which encodes all forensically relevant causal dependencies regardless of their layer of origin. To transparently realize that vision on commodity systems, we present OmegaLog, a provenance tracker that bridges the semantic gap between system and application logging contexts. OmegaLog analyzes program binaries to identify and model application-layer logging behaviors, enabling accurate reconciliation of application events with system-layer accesses. OmegaLog then intercepts applications’ runtime logging activities and grafts those events onto the system-layer provenance graph, allowing investigators to reason more precisely about the nature of attacks. We demonstrate that our system is widely applicable to existing software projects and can transparently facilitate execution partitioning of provenance graphs without any training or developer intervention. Evaluation on real-world attack scenarios shows that our technique generates concise provenance graphs with rich semantic information relative to the state-of-the-art, with an average runtime overhead of 4%  more » « less
Award ID(s):
1750024 1657534
NSF-PAR ID:
10146533
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the IEEE Symposium on Security and Privacy
ISSN:
1063-9578
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent advances in causality analysis have enabled investigators to trace multi-stage attacks using whole- system provenance graphs. Based on system-layer audit logs (e.g., syscalls), these approaches omit vital sources of application context (e.g., email addresses, HTTP response codes) that can found in higher layers of the system. Although this information is often essential to understanding attack behaviors, incorporating this evidence into causal analysis engines is difficult due to the semantic gap that exists between system layers. To address this shortcoming, we propose the notion of universal provenance, which encodes all forensically-relevant causal dependencies regardless of their layer of origin. To transparently realize this vision on commodity systems, we present ωLOG (“Omega Log”), a provenance tracking mechanism that bridges the semantic gap between system and application logging contexts. ωLOG analyzes program binaries to identify and model application-layer logging behaviors, enabling application events to be accurately reconciled with system-layer accesses. ωLOG then intercepts applications’ runtime logging activities and grafts those events onto the system-layer provenance graph, allowing investigators to reason more precisely about the nature of attacks. We demonstrate that ωLOG is widely-applicable to existing software projects and can transparently facilitate execution partitioning of dependency graphs without any training or developer intervention. Evaluation on real-world attack scenarios shows that universal provenance graphs are concise and rich with semantic information as compared to the state-of-the-art, with 12% average runtime overhead. 
    more » « less
  2. Provenance-based causal analysis of audit logs has proven to be an invaluable method of investigating system intrusions. However, it also suffers from dependency explosion, whereby long-running processes accumulate many dependencies that are hard to unravel. Execution unit partitioning addresses this by segmenting dependencies into units of work, such as isolating the events that processed a single HTTP request. Unfortunately, we discover that current designs have a semantic gap problem due to how system calls and application log messages are used to infer complex internal program states. We demonstrate how attackers can modify existing code exploits to control event partitioning, breaking links in the attack and framing innocent users. We also show how our techniques circumvent existing program and log integrity defenses. We then propose a new design for execution unit partitioning that leverages additional runtime data to yield verified partitions that resist manipulation. Our design overcomes the technical challenges of minimizing additional overhead while accurately connecting low level code instructions to high level audit events, in part with the use of commodity hardware processor tracing. We implement a prototype of our design for Linux, MARSARA, and extensively evaluate it on 14 real-world programs, targeted with expertly crafted exploits. MARSARA's verified partitions successfully capture all the attack provenances while only reintroducing 2.82% of false dependencies, in the worst case, with an average overhead of 8.7%. Using a new metric called Partitioning Attack Surface, we show that MARSARA eliminates 47,642 more repartitioning gadgets per program than integrity defenses like CFI, demonstrating our prototype's effectiveness and the novelty of the attacks it prevents. 
    more » « less
  3. Investigating the nature of system intrusions in large distributed systems remains a notoriously difficult challenge. While monitoring tools (e.g., Firewalls, IDS) provide preliminary alerts through easy-to-use administrative interfaces, attack reconstruction still requires that administrators sift through gigabytes of system audit logs stored locally on hundreds of machines. At present, two fundamental obstacles prevent synergy between system-layer auditing and modern cluster monitoring tools: 1) the sheer volume of audit data generated in a data center is prohibitively costly to transmit to a central node, and 2) system- layer auditing poses a “needle-in-a-haystack” problem, such that hundreds of employee hours may be required to diagnose a single intrusion. This paper presents Winnower, a scalable system for audit-based cluster monitoring that addresses these challenges. Our key insight is that, for tasks that are replicated across nodes in a distributed application, a model can be defined over audit logs to succinctly summarize the behavior of many nodes, thus eliminating the need to transmit redundant audit records to a central monitoring node. Specifically, Winnower parses audit records into provenance graphs that describe the actions of individual nodes, then performs grammatical inference over individual graphs using a novel adaptation of Deterministic Finite Automata (DFA) Learning to produce a behavioral model of many nodes at once. This provenance model can be efficiently transmitted to a central node and used to identify anomalous events in the cluster. We have implemented Winnower for Docker Swarm container clusters and evaluate our system against real-world applications and attacks. We show that Winnower dramatically reduces storage and network overhead associated with aggregating system audit logs, by as much as 98%, without sacrificing the important information needed for attack investigation. Winnower thus represents a significant step forward for security monitoring in distributed systems. 
    more » « less
  4. System auditing is a central concern when investigating and responding to security incidents. Unfortunately, attackers regularly engage in anti-forensic activities after a break-in, covering their tracks from the system logs in order to frustrate the efforts of investigators. While a variety of tamper-evident logging solutions have appeared throughout the industry and the literature, these techniques do not meet the operational and scalability requirements of system-layer audit frameworks. In this work, we introduce Custos, a practical framework for the detection of tampering in system logs. Custos consists of a tamper-evident logging layer and a decentralized auditing protocol. The former enables the verification of log integrity with minimal changes to the underlying logging framework, while the latter enables near real-time detection of log integrity violations within an enterprise-class network. Custos is made practical by the observation that we can decouple the costs of cryptographic log commitments from the act of creating and storing log events, without trading off security, leveraging features of off-the-shelf trusted execution environments. Supporting over one million events per second, we show that Custos' tamper-evident logging protocol is three orders of magnitude (1000×) faster than prior solutions and incurs only between 2% and 7% runtime overhead over insecure logging on intensive workloads. Further, we show that Custos' auditing protocol can detect violations in near real-time even in the presence of a powerful distributed adversary and with minimal (3%) network overhead. Our case study on a real-world APT attack scenario demonstrates that Custos forces anti-forensic attackers into a "lose-lose" situation, where they can either be covert and not tamper with logs (which can be used for forensics), or erase logs but then be detected by Custos. 
    more » « less
  5. Large enterprises are increasingly relying on threat detection softwares (e.g., Intrusion Detection Systems) to allow them to spot suspicious activities. These softwares generate alerts which must be investigated by cyber analysts to figure out if they are true attacks. Unfortunately, in practice, there are more alerts than cyber analysts can properly investigate. This leads to a “threat alert fatigue” or information overload problem where cyber analysts miss true attack alerts in the noise of false alarms. In this paper, we present NoDoze to combat this challenge using contextual and historical information of generated threat alert in an enterprise. NoDoze first generates a causal dependency graph of an alert event. Then, it assigns an anomaly score to each event in the dependency graph based on the frequency with which related events have happened before in the enterprise. NoDoze then propagates those scores along the edges of the graph using a novel network diffusion algorithm and generates a subgraph with an aggregate anomaly score which is used to triage alerts. Evaluation on our dataset of 364 threat alerts shows that NoDoze decreases the volume of false alarms by 86%, saving more than 90 hours of analysts’ time, which was required to investigate those false alarms. Furthermore, NoDoze generated dependency graphs of true alerts are 2 orders of magnitude smaller than those generated by traditional tools without sacrificing the vital information needed for the investigation. Our system has a low average runtime overhead and can be deployed with any threat detection software. 
    more » « less