skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Visualizing Web Application Execution Logs to Improve Software Security Defect Localization
Interactive web-based applications play an important role for both service providers and consumers. However, web applications tend to be complex, produce high-volume data, and are often ripe for attack. Attack analysis and remediation are complicated by adversary obfuscation and the difficulty in assembling and analyzing logs. In this work, we explore the web application analysis task through log file fusion, distillation, and visualization. Our approach consists of visualizing the logs of web and database traffic with detailed function execution traces. We establish causal links between events and their associated behaviors. We evaluate the effectiveness of this process using data volume reduction statistics, user interaction models, and usage scenarios. Across a set of scenarios, we find that our techniques can filter at least 97.5% of log data and reduce analysis time by 93-96%.  more » « less
Award ID(s):
1814402
PAR ID:
10332304
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE Conference on Software Analysis, Evolution and Reengineering (SANER) Workshop on Validation, Analysis and Evolution of Software Tests
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The volume, variety, and velocity of different data, e.g., simulation data, observation data, and social media data, are growing ever faster, posing grand challenges for data discovery. An increasing trend in data discovery is to mine hidden relationships among users and metadata from the web usage logs to support the data discovery process. Web usage log mining is the process of reconstructing sessions from raw logs and finding interesting patterns or implicit linkages. The mining results play an important role in improving quality of search-related components, e.g., ranking, query suggestion, and recommendation. While researches were done in the data discovery domain, collecting and analyzing logs efficiently remains a challenge because (1) the volume of web usage logs continues to grow as long as users access the data; (2) the dynamic volume of logs requires on-demand computing resources for mining tasks; (3) the mining process is compute-intensive and time-intensive. To speed up the mining process, we propose a cloud-based log-mining framework using Apache Spark and Elasticsearch. In addition, a data partition paradigm, logPartitioner, is designed to solve the data imbalance problem in data parallelism. As a proof of concept, oceanographic data search and access logs are chosen to validate performance of the proposed parallel log-mining framework. 
    more » « less
  2. null (Ed.)
    Auditing is an increasingly essential tool for the defense of computing systems, but the unwieldy nature of log data imposes significant burdens on administrators and analysts. To address this issue, a variety of techniques have been proposed for approximating the contents of raw audit logs, facilitating efficient storage and analysis. However, the security value of these approximated logs is difficult to measure—relative to the original log, it is unclear if these techniques retain the forensic evidence needed to effectively investigate threats. Unfortunately, prior work has only investigated this issue anecdotally, demonstrating sufficient evidence is retained for specific attack scenarios. In this work, we address this gap in the literature through formalizing metrics for quantifying the forensic validity of an approximated audit log under differing threat models. In addition to providing quantifiable security arguments for prior work, we also identify a novel point in the approximation design space—that log events describing typical (benign) system activity can be aggressively approximated, while events that encode anomalous behavior should be preserved with lossless fidelity. We instantiate this notion of Attack-Preserving forensic validity in LogApprox, a new approximation technique that eliminates the redundancy of voluminous file I/O associated with benign process activities. We evaluate LogApprox alongside a corpus of exemplar approximation techniques from prior work and demonstrate that LogApprox achieves comparable log reduction rates while retaining 100% of attack-identifying log events. Additionally, we utilize this evaluation to illuminate the inherent trade-off between performance and utility within existing approximation techniques. This work thus establishes trustworthy foundations for the design of the next generation of efficient auditing frameworks. 
    more » « less
  3. Cybercrime scene reconstruction that aims to reconstruct a previous execution of the cyber attack delivery process is an important capability for cyber forensics (e.g., post mortem analysis of the cyber attack executions). Unfortunately, existing techniques such as log-based forensics or record-and-replay techniques are not suitable to handle complex and long-running modern applications for cybercrime scene reconstruction and post mortem forensic analysis. Specifically, log-based cyber forensics techniques often suffer from a lack of inspection capability and do not provide details of how the attack unfolded. Record-and-replay techniques impose significant runtime overhead, often require significant modifications on end-user systems, and demand to replay the entire recorded execution from the beginning. In this paper, we propose C2SR, a novel technique that can reconstruct an attack delivery chain (i.e., cybercrime scene) for post-mortem forensic analysis. It provides a highly desired capability: interactable partial execution reconstruction. In particular, it reproduces a partial execution of interest from a large execution trace of a long-running program. The reconstructed execution is also interactable, allowing forensic analysts to leverage debugging and analysis tools that did not exist on the recorded machine. The key intuition behind C2SR is partitioning an execution trace by resources and reproducing resource accesses that are consistent with the original execution. It tolerates user interactions required for inspections that do not cause inconsistent resource accesses. Our evaluation results on 26 real-world programs show that C2SR has low runtime overhead (less than 5.47%) and acceptable space overhead. We also demonstrate with four realistic attack scenarios that C2SR successfully reconstructs partial executions of long-running applications such as web browsers, and it can remarkably reduce the user’s efforts to understand the incident. 
    more » « less
  4. Researchers have investigated a number of strategies for capturing and analyzing data analyst event logs in order to design better tools, identify failure points, and guide users. However, this remains challenging because individual- and session-level behavioral differences lead to an explosion of complexity and there are few guarantees that log observations map to user cognition. In this paper we introduce a technique for segmenting sequential analyst event logs which combines data, interaction, and user features in order to create discrete blocks of goal-directed activity. Using measures of inter-dependency and comparisons between analysis states, these blocks identify patterns in interaction logs coupled with the current view that users are examining. Through an analysis of publicly available data and data from a lab study across a variety of analysis tasks, we validate that our segmentation approach aligns with users’ changing goals and tasks. Finally, we identify several downstream applications for our approach. 
    more » « less
  5. null (Ed.)
    For system logs to aid in security investigations, they must be beyond the reach of the adversary. Unfortunately, attackers that have escalated privilege on a host are typically able to delete and modify log events at will. In response to this threat, a variety of secure logging systems have appeared over the years that attempt to provide tamper-resistance (e.g., write once read many drives, remote storage servers) or tamper-evidence (e.g., cryptographic proofs) for system logs. These solutions expose an interface through which events are committed to a secure log, at which point they enjoy protection from future tampering. However, all proposals to date have relied on the assumption that an event's occurrence is concomitant with its commitment to the secured log. In this work, we challenge this assumption by presenting and validating a race condition attack on the integrity of audit frameworks. Our attack exploits the intrinsically asynchronous nature of I/O and IPC activity, demonstrating that an attacker can snatch events about their intrusion out of message buffers after they have occurred but before they are committed to the log, thus bypassing existing protections. We present a first step towards defending against our attack by introducing KennyLoggings, the first kernel- based tamper-evident logging system that satisfies the synchronous integrity property, meaning that it guarantees tamper-evidence of events upon their occurrence. We implement KennyLoggings on top of the Linux kernel and show that it imposes between 8% and 11% overhead on log-intensive application workloads. 
    more » « less