Database Management Systems (DBMSes) secure data against regular users through defensive mechanisms such as access control, and against privileged users with detection mechanisms such as audit logging. Interestingly, these security mechanisms are built into the DBMS and are thus only useful for monitoring or stopping operations that are executed through the DBMS API. Any access that involves directly modifying database files (at file system level) would, by definition, bypass any and all security layers built into the DBMS itself. In this paper, we propose and evaluate an approach that detects direct modifications to database files that have already bypassed the DBMS and its internal security mechanisms. Our approach applies forensic analysis to first validate database indexes and then compares index state with data in the DBMS tables. We show that indexes are much more difficult to modify and can be further fortified with hashing. Our approach supports most relational DBMSes by leveraging index structures that are already built into the system to detect database storage tampering that would currently remain undetectable.
Where Provenance in Database Storage
Where provenance is a relationship between a data item and the location from which this data was copied. In a DBMS, a typical use of where provenance is in establishing a copy-by-address relationship between the output of a query and the particular data value(s) that originated it. Normal DBMS operations create a variety of auxiliary copies of the data (e.g., indexes, MVs, cached copies). These copies exist over time with relationships that evolve continuously – (A) indexes maintain the copy with a reference to the origin value, (B) MVs maintain the copy without a reference to the source table, (C) cached copies are created once and are never maintained. A query may be answered from any of these auxiliary copies; however, this where provenance is not computed or maintained. In this paper, we describe sources from which forensic analysis of storage can derive where provenance of table data. We also argue that this computed where provenance can be useful (and perhaps necessary) for accurate forensic reports and evidence from maliciously altered databases or validation of corrupted DBMS storage.
- Award ID(s):
- 1656268
- Publication Date:
- NSF-PAR ID:
- 10078871
- Journal Name:
- IPAW 2018
- Page Range or eLocation-ID:
- 231-235
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Database Management Systems (DBMSes) secure data against regular users through defensive mechanisms such as access control, and against privileged users with detection mechanisms such as audit logging. Interestingly, these security mechanisms are built into the DBMS and are thus only useful for monitoring or stopping operations that are executed through the DBMS API. Any access that involves directly modifying database files (at file system level) would, by definition, bypass any and all security layers built into the DBMS itself. In this paper,we propose and evaluate an approach that detects direct modifications to database files that have already bypassed the DBMS and its internal security mechanisms. Our approach applies forensic analysis to first validate database indexes and then compares index state with data in the DBMS tables. We show that indexes are much more difficult to modify and can be further fortified with hashing. Our approach supports most relational DBMSes by leveraging index structures that are already built into the system to detect database storage tampering that would currently remain undetectable.
-
Abstract
<p>Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the <code>binder.sqlite</code> file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in the <code>events</code> table, which has the following schema.</p> <code>CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp); </code> <ul><li><code>version</code> indicates the version of the record as assigned by Binder. The <code>origin</code> field became available with version 3, and the <code>ref</code> field with version 4. Older records where this information was not recorded will have the corresponding fields set to null.</li><li><code>timestamp</code> is the ISO timestamp of the launch</li><li><code>provider</code> gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the -
Not Known (Ed.)Several treatment plants were sampled for raw influent, primary clarifier sludge, return activated sludge (RAS), and anaerobically digested sludge throughout nine weeks during the summer of the COVID-19 pandemic. Primary clarifier sludge had a significantly higher number of SARS-CoV-2 gene copy number per liter (GC/L) than other sludge samples, within a range from 1.0x105 to 1.0x106 GC/L. Gene copy numbers in raw influent significantly correlated with gene copy numbers in RAS in Silver Creek (p-value = 0.007, R2 = 0.681) and East Canyon (p-value = 0.009, R2 = 0.775) WRFs; both of which lack primary clarifiers or industrial pretreatment processes. This data indicates that SARS-CoV-2 gene copies tend to partition into primary clarifier sludges, at which point a significant portion of them are removed through sedimentation. Furthermore, it was found that East Canyon WRF gene copy numbers in influent were a significant predictor of daily cases (p-value = 0.0322, R2 = 0.561), and gene copy numbers in RAS were a significant predictor of weekly cases (p-value = 0.0597, R2 = 0.449). However, gene copy numbers found in primary sludge samples from other plants significantly predicted the number of COVID-19 cases for the following week (t = 2.279) and the weekmore »
-
Making logical copies, or clones, of files and directories is critical to many real-world applications and workflows, including backups, virtual machines, and containers. An ideal clone implementation meets the following performance goals: (1) creating the clone has low latency; (2) reads are fast in all versions (i.e., spatial locality is always maintained, even after modifications); (3) writes are fast in all versions; (4) the overall system is space efficient. Implementing a clone operation that realizes all four properties, which we call a nimble clone , is a long-standing open problem. This article describes nimble clones in B-ϵ-tree File System (BetrFS), an open-source, full-path-indexed, and write-optimized file system. The key observation behind our work is that standard copy-on-write heuristics can be too coarse to be space efficient, or too fine-grained to preserve locality. On the other hand, a write-optimized key-value store, such as a Bε-tree or an log-structured merge-tree (LSM)-tree, can decouple the logical application of updates from the granularity at which data is physically copied. In our write-optimized clone implementation, data sharing among clones is only broken when a clone has changed enough to warrant making a copy, a policy we call copy-on-abundant-write . We demonstrate that the algorithmic workmore »