skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Breathing New Life into an Old Tree: Resolving Logging Dilemma of B + -tree on Modern Computational Storage Drives
Having dominated databases and various data management systems for decades,B+-tree is infamously subject to a logging dilemma: One could improveB+-tree speed performance by equipping it with a larger log, which nevertheless will degrade its crash recovery speed. Such a logging dilemma is particularly prominent in the presence of modern workloads that involve intensive small writes. In this paper, we propose a novel solution, calledper-page loggingbasedB+-tree, which leverages the emerging computational storage drive (CSD) with built-in transparent compression to fundamentally resolve the logging dilemma. Our key idea is to divide the large single log into many small (e.g., 4KB), highly compressibleper-page logs, each being statically bounded with aB+-tree page. All per-page logs together form a very large over-provisioned log space forB+-tree to improve its operational speed performance. Meanwhile, during crash recovery,B+-tree does not need to scan any per-page logs, leading to a recovery latency independent from the total log size. We have developed and open-sourced a fully functional prototype. Our evaluation results show that, under small-write intensive workloads, our design solution can improveB+-tree operational throughput by up to 625.6% and maintain a crash recovery time of as low as 19.2 ms, while incurring a minimal storage overhead of only 0.5-1.6%.  more » « less
Award ID(s):
2210755 2210754 2006617
PAR ID:
10515268
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
The VLDB Endowment
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
17
Issue:
2
ISSN:
2150-8097
Page Range / eLocation ID:
134 to 147
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Non-volatile Memory (NVM) offers the opportunity to build large, durable B+ trees with markedly higher performance and faster post-crash recovery than is possible with traditional disk- or flash-based persistence. Unfortunately, cache flush and fence instructions, required for crash consistency and failure atomicity on many machines, introduce substantial overhead not present in non-persistent trees, and force additional NVM reads and writes. The overhead is particularly pronounced in workloads that benefit from cache reuse due to good temporal locality or small working sets---traits commonly observed in real-world applications. In this paper, we propose a buffered durable B+ tree (BD+Tree) that improves performance and reduces NVM traffic viarelaxedpersistence. Execution of a BD+Tree is divided intoepochsof a few milliseconds each; if a crash occurs in epoche,the tree recovers to its state as of the end of epoche-2. (The persistence boundary can always be made current with an explicit sync operation, which quickly advances the epoch by 2.) NVM writes within an epoch are aggregated for delayed persistence, thereby increasing cache reuse and reducing traffic to NVM. In comparison to state-of-the-art persistent B+ trees, our micro-benchmark experiments show that BD+Tree can improve throughput by up to 2.4x and reduce NVM writes by up to 90% when working sets are small or workloads exhibit strong temporal locality. On real-world workloads that benefit from cache reuse, BD+Tree realizes throughput improvements of 1.1--2.4x and up to a 99% decrease in NVM writes. Even on uniform workloads, with working sets that significantly exceed cache capacity, BD+Tree still improves throughput by 1--1.3x. The performance advantage of BD+Tree increases with larger caches, suggesting ongoing benefits as CPUs evolve toward gigabyte cache capacities. 
    more » « less
  2. System auditing is a central concern when investigating and responding to security incidents. Unfortunately, attackers regularly engage in anti-forensic activities after a break-in, covering their tracks from the system logs in order to frustrate the efforts of investigators. While a variety of tamper-evident logging solutions have appeared throughout the industry and the literature, these techniques do not meet the operational and scalability requirements of system-layer audit frameworks. In this work, we introduce Custos, a practical framework for the detection of tampering in system logs. Custos consists of a tamper-evident logging layer and a decentralized auditing protocol. The former enables the verification of log integrity with minimal changes to the underlying logging framework, while the latter enables near real-time detection of log integrity violations within an enterprise-class network. Custos is made practical by the observation that we can decouple the costs of cryptographic log commitments from the act of creating and storing log events, without trading off security, leveraging features of off-the-shelf trusted execution environments. Supporting over one million events per second, we show that Custos' tamper-evident logging protocol is three orders of magnitude (1000×) faster than prior solutions and incurs only between 2% and 7% runtime overhead over insecure logging on intensive workloads. Further, we show that Custos' auditing protocol can detect violations in near real-time even in the presence of a powerful distributed adversary and with minimal (3%) network overhead. Our case study on a real-world APT attack scenario demonstrates that Custos forces anti-forensic attackers into a "lose-lose" situation, where they can either be covert and not tamper with logs (which can be used for forensics), or erase logs but then be detected by Custos. 
    more » « less
  3. Boki is a new serverless runtime that exports a shared log API to serverless functions. Boki shared logs enable stateful serverless applications to manage their state with durability, consistency, and fault tolerance. Boki shared logs achieve high throughput and low latency. The key enabler is themetalog, a novel mechanism that allows Boki to address ordering, consistency and fault tolerance independently. The metalog orders shared log records with high throughput and it provides read consistency while allowing service providers to optimize the write and read path of the shared log in different ways. To demonstrate the value of shared logs for stateful serverless applications, we build Boki support libraries that implement fault-tolerant workflows, durable object storage, and message queues. Our evaluation shows that shared logs can speed up important serverless workloads by up to 4.2 ×. 
    more » « less
  4. null (Ed.)
    For system logs to aid in security investigations, they must be beyond the reach of the adversary. Unfortunately, attackers that have escalated privilege on a host are typically able to delete and modify log events at will. In response to this threat, a variety of secure logging systems have appeared over the years that attempt to provide tamper-resistance (e.g., write once read many drives, remote storage servers) or tamper-evidence (e.g., cryptographic proofs) for system logs. These solutions expose an interface through which events are committed to a secure log, at which point they enjoy protection from future tampering. However, all proposals to date have relied on the assumption that an event's occurrence is concomitant with its commitment to the secured log. In this work, we challenge this assumption by presenting and validating a race condition attack on the integrity of audit frameworks. Our attack exploits the intrinsically asynchronous nature of I/O and IPC activity, demonstrating that an attacker can snatch events about their intrusion out of message buffers after they have occurred but before they are committed to the log, thus bypassing existing protections. We present a first step towards defending against our attack by introducing KennyLoggings, the first kernel- based tamper-evident logging system that satisfies the synchronous integrity property, meaning that it guarantees tamper-evidence of events upon their occurrence. We implement KennyLoggings on top of the Linux kernel and show that it imposes between 8% and 11% overhead on log-intensive application workloads. 
    more » « less
  5. null (Ed.)
    Many applications require update-intensive work-loads on spatial objects, e.g., social-network services and shared-riding services that track moving objects (devices). By buffering insert and delete operations in memory, the Log Structured Merge Tree (LSM) has been used widely in various systems because of its ability to handle insert-intensive workloads. While the focus on LSM has been on key-value stores and their optimizations, there is a need to study how to efficiently support LSM-based secondary indexes. We investigate the augmentation of a main-memory-based memo structure into an LSM secondary index structure to handle update-intensive workloads efficiently. We conduct this study in the context of an R-tree-based secondary index. In particular, we introduce the LSM RUM-tree that demonstrates the use of an Update Memo in an LSM-based R-tree to enhance the performance of the R-tree's insert, delete, update, and search operations. The LSM RUM-tree introduces novel strategies to reduce the size of the Update Memo to be a light-weight in-memory structure that is suitable for handling update-intensive workloads without introducing significant over-head. Experimental results using real spatial data demonstrate that the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query processing over the existing LSM R-tree implementations. 
    more » « less