skip to main content


Title: SysGen: system state corpus generator
Security investigations often rely on forensic tools to deliver the necessary supporting evidence. It is therefore imperative that forensic tools are scientifically tested in both their accuracy and capabilities. The primary means to develop and validate forensic tools is by evaluating them against a set of known answers (i.e., a data corpus). While researchers have long recognized the need for standardized forensic corpora, there are few such tools or datasets available, particularly for database management systems (DBMS). In fact, there are currently no publicly available tools that can generate a DBMS dataset for forensic testing. In this paper, we share SysGen, a customizeable data generator and a pre-built corpus that offers a reference for most major relational DBMSes. The pre-built corpus includes individual DBMS files, the full disk snapshot, the RAM snapshot, and network packets taken from a set of clean virtual machines. SysGen can be easily adapted to execute a custom workload scenario, capturing a new data corpus; it can also create other variations of full system snapshots, even beyond DBMS testing.  more » « less
Award ID(s):
1656268
NSF-PAR ID:
10210364
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security
Volume:
15th
Issue:
45
Page Range / eLocation ID:
1-6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The majority of sensitive and personal user data is stored in different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for most webstores, and SQLite stores personal data such as SMS messages on a phone or browser bookmarks. Each DBMS manages its own storage (within the operating system), thus databases require their own set of forensic tools. While database carving solutions have been built by multiple research groups, forensic investigators today still lack the tools necessary to analyze DBMS forensic artifacts. The unique nature of database storage and the resulting forensic artifacts require established standards for artifact storage and viewing mechanisms in order for such advanced analysis tools to be developed. In this paper, we present 1) a standard storage format, Database Forensic File Format (DB3F), for database forensic tools output that follows the guidelines established by other (file system) forensic tools, and 2) a view and search toolkit, Database Forensic Toolkit (DF-Toolkit), that enables the analysis of data stored in our database forensic format. Using our prototype implementation, we demonstrate that our toolkit follows the state-of-the-art design used by current forensic tools and offers easy-to-interpret database artifact search capabilities. 
    more » « less
  2. Cyberattacks continue to evolve and adapt to state-of-the-art security mechanisms. Therefore, it is critical for security experts to routinely inspect audit logs to detect complex security breaches. However, if a system was compromised during a cyberattack, the validity of the audit logs themselves cannot necessarily be trusted. Specifically, for a database management system (DBMS), an attacker with elevated privileges may temporarily disable the audit logs, bypassing logging altogether along with any tamper-proof logging mechanisms. Thus, security experts need techniques to validate logs independent of a potentially compromised system to detect security breaches. This paper demonstrates that SQL query operations produce a repeatable set of patterns within DBMS process memory. Operations such as full table scans, index accesses, or joins each produce their own set of distinct forensic artifacts in memory. Given these known patterns, we propose that collecting forensic artifacts from a trusted memory snapshot allows one to reverse-engineer query activity and validate audit logs independent of the DBMS itself and outside the scope of a database administrator's privileges. We rely on the fact the queries must ultimately be processed in memory regardless of any security mechanisms they may have bypassed. This work is generalized to all relational DBMSes by using two representative DBMSes, Oracle and MySQL. 
    more » « less
  3. Database Management Systems (DBMSes) secure data against regular users through defensive mechanisms such as access control, and against privileged users with detection mechanisms such as audit logging. Interestingly, these security mechanisms are built into the DBMS and are thus only useful for monitoring or stopping operations that are executed through the DBMS API. Any access that involves directly modifying database files (at file system level) would, by definition, bypass any and all security layers built into the DBMS itself. In this paper, we propose and evaluate an approach that detects direct modifications to database files that have already bypassed the DBMS and its internal security mechanisms. Our approach applies forensic analysis to first validate database indexes and then compares index state with data in the DBMS tables. We show that indexes are much more difficult to modify and can be further fortified with hashing. Our approach supports most relational DBMSes by leveraging index structures that are already built into the system to detect database storage tampering that would currently remain undetectable. 
    more » « less
  4. Database Management Systems (DBMSes) secure data against regular users through defensive mechanisms such as access control, and against privileged users with detection mechanisms such as audit logging. Interestingly, these security mechanisms are built into the DBMS and are thus only useful for monitoring or stopping operations that are executed through the DBMS API. Any access that involves directly modifying database files (at file system level) would, by definition, bypass any and all security layers built into the DBMS itself. In this paper,we propose and evaluate an approach that detects direct modifications to database files that have already bypassed the DBMS and its internal security mechanisms. Our approach applies forensic analysis to first validate database indexes and then compares index state with data in the DBMS tables. We show that indexes are much more difficult to modify and can be further fortified with hashing. Our approach supports most relational DBMSes by leveraging index structures that are already built into the system to detect database storage tampering that would currently remain undetectable. 
    more » « less
  5. Where provenance is a relationship between a data item and the location from which this data was copied. In a DBMS, a typical use of where provenance is in establishing a copy-by-address relationship between the output of a query and the particular data value(s) that originated it. Normal DBMS operations create a variety of auxiliary copies of the data (e.g., indexes, MVs, cached copies). These copies exist over time with relationships that evolve continuously – (A) indexes maintain the copy with a reference to the origin value, (B) MVs maintain the copy without a reference to the source table, (C) cached copies are created once and are never maintained. A query may be answered from any of these auxiliary copies; however, this where provenance is not computed or maintained. In this paper, we describe sources from which forensic analysis of storage can derive where provenance of table data. We also argue that this computed where provenance can be useful (and perhaps necessary) for accurate forensic reports and evidence from maliciously altered databases or validation of corrupted DBMS storage. 
    more » « less