skip to main content


Search for: All records

Award ID contains: 1656268

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    The contents of RAM in an operating system (OS) are a critical source of evidence for malware detection or system performance profiling. Digital forensics focused on reconstructing OS RAM structures to detect malware patterns at runtime. In an ongoing arms race, these RAM reconstruction approaches must be designed for the attack they are trying to detect. Even though database management systems (DBMS) are collectively responsible for storing and processing most data in organizations, the equivalent problem of memory reconstruction has not been considered for DBMS-managed RAM. In this paper, we propose and evaluate a systematic approach to reverse engineer data structures and access patterns in DBMS RAM. Rather than develop a solution for specific scenarios, we describe an approach to detect and track any RAM area in a DBMS. We evaluate our approach with the four most common RAM areas in well-known DBMSes; this paper describes the design of each area-specific query workload and the process to capture and quantify that area at runtime. We further evaluate our approach by observing the RAM data flow in presence of built-in DBMS encryption. We present an overview of available DBMS encryption mechanisms, their relative advantages and disadvantages, and then illustrate the practical implications for the four memory areas. 
    more » « less
  2. null (Ed.)
    Security investigations often rely on forensic tools to deliver the necessary supporting evidence. It is therefore imperative that forensic tools are scientifically tested in both their accuracy and capabilities. The primary means to develop and validate forensic tools is by evaluating them against a set of known answers (i.e., a data corpus). While researchers have long recognized the need for standardized forensic corpora, there are few such tools or datasets available, particularly for database management systems (DBMS). In fact, there are currently no publicly available tools that can generate a DBMS dataset for forensic testing. In this paper, we share SysGen, a customizeable data generator and a pre-built corpus that offers a reference for most major relational DBMSes. The pre-built corpus includes individual DBMS files, the full disk snapshot, the RAM snapshot, and network packets taken from a set of clean virtual machines. SysGen can be easily adapted to execute a custom workload scenario, capturing a new data corpus; it can also create other variations of full system snapshots, even beyond DBMS testing. 
    more » « less
  3. null (Ed.)
  4. null (Ed.)
  5. The majority of sensitive and personal user data is stored in different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for most webstores, and SQLite stores personal data such as SMS messages on a phone or browser bookmarks. Each DBMS manages its own storage (within the operating system), thus databases require their own set of forensic tools. While database carving solutions have been built by multiple research groups, forensic investigators today still lack the tools necessary to analyze DBMS forensic artifacts. The unique nature of database storage and the resulting forensic artifacts require established standards for artifact storage and viewing mechanisms in order for such advanced analysis tools to be developed. In this paper, we present 1) a standard storage format, Database Forensic File Format (DB3F), for database forensic tools output that follows the guidelines established by other (file system) forensic tools, and 2) a view and search toolkit, Database Forensic Toolkit (DF-Toolkit), that enables the analysis of data stored in our database forensic format. Using our prototype implementation, we demonstrate that our toolkit follows the state-of-the-art design used by current forensic tools and offers easy-to-interpret database artifact search capabilities. 
    more » « less
  6. The pervasive use of databases for the storage of critical and sensitive information in many organizations has led to an increase in the rate at which databases are exploited in computer crimes. While there are several techniques and tools available for database forensic analysis, such tools usually assume an apriori database preparation, such as relying on tamper-detection software to already be in place and the use of detailed logging. Further, such tools are built-in and thus can be compromised or corrupted along with the database itself. In practice, investigators need forensic and security audit tools that work on poorlyconfigured systems and make no assumptions about the extent of damage or malicious hacking in a database. In this paper, we present our database forensics methods, which are capable of examining database content from a storage (disk or RAM) image without using any log or file system metadata. We describe how these methods can be used to detect security breaches in an untrusted environment where the security threat arose from a privileged user (or someone who has obtained such privileges). Finally, we argue that a comprehensive and independent audit framework is necessary in order to detect and counteract threats in an environment where the security breach originates from an administrator (either at database or operating system level). 
    more » « less
  7. Commercial cloud database services increase availability of data and provide reliable access to data. Routine database maintenance tasks such as clustering, however, increase the costs of hosting data on commercial cloud instances. Clustering causes an I/O burst; clustering in one-shot depletes I/O credit accumulated by an instance and increases the cost of hosting data. An unclustered database decreases query performance by scanning large amounts of data, gradually depleting I/O credits. In this paper, we introduce Physical Location Index Plus (PLI+), an indexing method for databases hosted on commercial cloud. PLI+ relies on internal knowledge of data layout, building a physical location index, which maps a range of physical co-locations with a range of attribute values to create approximately sorted buckets. As new data is inserted, writes are partitioned in memory based on incoming data distribution. The data is written to physical locations on disk in block-based partitions to favor large granularity I/O. Incoming SQL queries on indexed attribute values are rewritten in terms of the physical location ranges. As a result, PLI+ does not decrease query performance on an unclustered cloud database instance, DBAs may choose to cluster the instance when they have sufficiently large I/O credit available for clustering thus delaying the need for clustering. We evaluate query performance over PLI+ by comparing it with clustered, unclustered (secondary) indexes, and log-structured merge trees on real datasets. Experiments show that PLI+ significantly delays clustering, and yet does not degrade query performance—thus achieving higher level of sortedness than unclustered indexes and log-structured merge trees. We also evaluate the quality of clustering by introducing a measure of interval sortedness, and the size of index. 
    more » « less
  8. Where provenance is a relationship between a data item and the location from which this data was copied. In a DBMS, a typical use of where provenance is in establishing a copy-by-address relationship between the output of a query and the particular data value(s) that originated it. Normal DBMS operations create a variety of auxiliary copies of the data (e.g., indexes, MVs, cached copies). These copies exist over time with relationships that evolve continuously – (A) indexes maintain the copy with a reference to the origin value, (B) MVs maintain the copy without a reference to the source table, (C) cached copies are created once and are never maintained. A query may be answered from any of these auxiliary copies; however, this where provenance is not computed or maintained. In this paper, we describe sources from which forensic analysis of storage can derive where provenance of table data. We also argue that this computed where provenance can be useful (and perhaps necessary) for accurate forensic reports and evidence from maliciously altered databases or validation of corrupted DBMS storage. 
    more » « less
  9. The pervasive use of databases for the storage of critical and sensitive information in many organizations has led to an increase in the rate at which databases are exploited in computer crimes. While there are several techniques and tools available for database forensics, they mostly assume apriori database preparation, such as relying on tamper-detection software to already be in place or use of detailed logging. Alternatively, investigators need forensic tools and techniques that work on poorly-configured databases and make no assumptions about the extent of damage in a database. In this paper, we present our database forensics methods, which are capable of examining database content from a database image without using any log or system metadata. We describe how these methods can be used to detect security breaches in untrusted environments where the security threat arose from a privileged user (or someone who has obtained such privileges). 
    more » « less
  10. Database Management Systems (DBMSes) secure data against regular users through defensive mechanisms such as access control, and against privileged users with detection mechanisms such as audit logging. Interestingly, these security mechanisms are built into the DBMS and are thus only useful for monitoring or stopping operations that are executed through the DBMS API. Any access that involves directly modifying database files (at file system level) would, by definition, bypass any and all security layers built into the DBMS itself. In this paper, we propose and evaluate an approach that detects direct modifications to database files that have already bypassed the DBMS and its internal security mechanisms. Our approach applies forensic analysis to first validate database indexes and then compares index state with data in the DBMS tables. We show that indexes are much more difficult to modify and can be further fortified with hashing. Our approach supports most relational DBMSes by leveraging index structures that are already built into the system to detect database storage tampering that would currently remain undetectable. 
    more » « less