NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Benchmarking, Analyzing, and Optimizing WA of Partial Compaction in RocksDB

Wei, Ran; Zhu, Zichen; Kryczka, Andrew; Zhuang, Jay; Athanassoulis, Manos (March 2025, Advances in database technology)

Free, publicly-accessible full text available March 1, 2026
QuIT your B+-tree for the Quick Insertion Tree

Raman, Aneesh; Karatsenidis, Konstantinos; Xie, Shaolin; Olma, Matthaios; Sarkar, Subhadeep; Athanassoulis, Manos (March 2025, OpenProceedings.org)

Free, publicly-accessible full text available March 1, 2026
CUBIT: Concurrent Updatable Bitmap Indexing

https://doi.org/10.14778/3705829.3705854

Wang, Junchang; Athanassoulis, Manos (October 2024, Proceedings of the VLDB Endowment)

Bitmap indexes are widely used for read-intensive analytical workloads because they are clustered and offer efficient reads with a small memory footprint. However, they are generally inefficient to update. As analytical applications are increasingly fused with transactional applications, leading to the emergence of hybrid transactional/analytical processing (HTAP), it is desirable that bitmap indexes support efficient concurrent real-time updates. In this paper, we propose Concurrent Updatable Bitmap indexing (CUBIT) that offers efficient real-time updates that scale with the number of CPU cores used and do not interfere with queries. Our design relies on three principles. First, we employ a horizontal bitwise representation of updated bits, which enables efficient atomic updates without locking entire bitvectors. Second, we propose a lightweight snapshotting mechanism that allows queries to run on separate snapshots and provides a wait-free progress guarantee. Third, we consolidate updates in a latch-free manner, providing a strong progress guarantee. Our evaluation shows that CUBIT offers 3--16× higher throughput and 3--220× lower latency than state-of-the-art updatable bitmap indexes. CUBIT's update-friendly nature widens the applicability of bitmap indexing. Experimenting with OLAP workloads with standard, batched updates shows that CUBIT overcomes the maintenance downtime and outperforms DuckDB by 1.2--2.7× on TPC-H. For HTAP workloads with real-time updates, CUBIT achieves 2--11× performance improvement over the state-of-the-art approaches.
more » « less
Full Text Available
Benchmarking Learned and LSM Indexes for Data Sortedness

https://doi.org/10.1145/3662165.3662764

Raman, Aneesh; Huynh, Andy; Lu, Jinqi; Athanassoulis, Manos (June 2024, ACM)

Full Text Available
KVBench: A Key-Value Benchmarking Suite

https://doi.org/10.1145/3662165.3662765

Zhu, Zichen; Saha, Arpita; Athanassoulis, Manos; Sarkar, Subhadeep (June 2024, ACM)

Full Text Available
CAVE: Concurrency-Aware Graph Processing on SSDs

https://doi.org/10.1145/3654928

Papon, Tarikul Islam; Chen, Taishan; Zhang, Shuo; Athanassoulis, Manos (May 2024, Proceedings of the ACM on Management of Data)

Large-scale graph analytics has become increasingly common in areas like social networks, physical sciences, transportation networks, and recommendation systems. Since many such practical graphs do not fit in main memory, graph analytics performance depends on efficiently utilizing underlying storage devices. These out-of-core graph processing systems employ sharding and sub-graph partitioning to optimize for storage while relying on efficient sequential access of traditional hard disks. However, today's storage is increasingly based on solid-state drives (SSDs) that exhibit high internal parallelism and efficient random accesses. Yet, state-of-the-art graph processing systems do not explicitly exploit those properties, resulting in subpar performance. In this paper, we develop CAVE, the first graph processing engine that optimally exploits underlying SSD-based storage by harnessing the available storage device parallelism via carefully selecting which I/Os to graph data can be issued concurrently. Thus, CAVE traverses multiple paths and processes multiple nodes and edges concurrently, achieving parallelization at a granular level. We identify two key ways to parallelize graph traversal algorithms based on the graph structure and algorithm: intra-subgraph and inter-subgraph parallelization. The first identifies subgraphs that contain vertices that can be accessed in parallel, while the latter identifies subgraphs that can be processed in their entirety in parallel. To showcase the benefit of our approach, we build within CAVE parallelized versions of five popular graph algorithms (Breadth-First Search, Depth-First Search, Weakly Connected Components, PageRank, Random Walk) that exploit the full bandwidth of the underlying device. CAVE uses a blocked file format based on adjacency lists and employs a concurrent cache pool that is essential to the parallelization of graph algorithms. By experimenting with different types of graphs on three SSD devices, we demonstrate that CAVE utilizes the available parallelism, and scales to diverse real-world graph datasets. CAVE achieves up to one order of magnitude speedup compared to the popular out-of-core systems Mosaic and GridGraph, and up to three orders of magnitude speedup in runtime compared to GraphChi.
more » « less
Full Text Available
Towards flexibility and robustness of LSM trees

https://doi.org/10.1007/s00778-023-00826-9

Huynh, Andy; Chaudhari, Harshal_A; Terzi, Evimaria; Athanassoulis, Manos (January 2024, The VLDB Journal)

Abstract Log-structured merge trees (LSM trees) are increasingly used as part of the storage engine behind several data systems, and are frequently deployed in the cloud. As the number of applications relying on LSM-based storage backends increases, the problem of performance tuning of LSM trees receives increasing attention. We consider bothnominaltunings—where workload and execution environment are accurately known a priori—androbusttunings—which consideruncertaintyin the workload knowledge. This type of workload uncertainty is common in modern applications, notably in shared infrastructure environments like the public cloud. To address this problem, we introduceEndure, a new paradigm for tuning LSM trees in the presence of workload uncertainty. Specifically, we focus on the impact of the choice of compaction policy, size ratio, and memory allocation on the overall performance.Endureconsiders a robust formulation of the throughput maximization problem and recommends a tuning that offers near-optimal throughput when the executed workload is not the same, instead in aneighborhoodof the expected workload. Additionally, we explore the robustness of flexible LSM designs by proposing a new unified design called K-LSM that encompasses existing designs. We deploy our robust tuning system,Endure, on a state-of-the-art key-value store, RocksDB, and demonstrate throughput improvements of up to 5$$\times $$ $\times$ in the presence of uncertainty. Our results indicate that the tunings obtained byEndureare more robust than tunings obtained under our expanded LSM design space. This indicates that robustness may not be inherent to a design, instead, it is an outcome of a tuning process that explicitly accounts for uncertainty.
more » « less
Data Structures for Data-Intensive Applications: Tradeoffs and Design Guidelines

https://doi.org/10.1561/1900000059

Athanassoulis, Manos; Idreos, Stratos; Shasha, Dennis (July 2023, Foundations and Trends® in Databases)

Full Text Available
Enabling Timely and Persistent Deletion in LSM-Engines

https://doi.org/10.1145/3599724

Sarkar, Subhadeep; Papon, Tarikul Islam; Staratzis, Dimitris; Zhu, Zichen; Athanassoulis, Manos (September 2023, ACM Transactions on Database Systems)

Data-intensive applications have fueled the evolution oflog-structured merge (LSM)based key-value engines that employ theout-of-placeparadigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost oftreating deletes as second-class citizens. A delete operation inserts atombstonethat invalidates older instances of the deleted key. State-of-the-art LSM-engines do not provide guarantees as to how fast a tombstone will propagate topersist the deletion. Further, LSM-engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written, leading to undesired latency spikes and increasing the overall operational cost of a database. Efficient and persistent deletion is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on data deletion, and (iii)en massecloud deployment of data systems. Further, we document that LSM-based key-value engines perform suboptimally in the presence of deletes in a workload. Tombstone-driven logical deletes, by design, are unable to purge the deleted entries in a timely manner, and retaining the invalidated entries perpetually affects the overall performance of LSM-engines in terms of space amplification, write amplification, and read performance. Moreover, the potentially unbounded latency for persistent deletes brings in critical privacy concerns in light of the data privacy protection regulations, such as theright to be forgottenin EU’s GDPR, theright to deletein California’s CCPA and CPRA, anddeletion rightin Virginia’s VCDPA. Toward this, we introduce the delete design space for LSM-trees and highlight the performance implications of the different classes of delete operations. To address these challenges, in this article, we build a new key-value storage engine,Lethe⁺, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show thatLethe⁺supports any user-defined threshold for the delete persistence latency offeringhigher read throughput(1.17× -1.4×) andlower space amplification(2.1× -9.8×), with a modest increase in write amplification (between 4% and 25%) that can be further amortized to less than 1%. In addition,Lethe⁺supports efficient range deletes on asecondary delete keyby dropping entire data pages without sacrificing read performance or employing a costly full tree merge.
more » « less
Full Text Available
Acheron: Persisting Tombstones in LSM Engines

https://doi.org/10.1145/3555041.3589719

Zhu, Zichen; Sarkar, Subhadeep; Athanassoulis, Manos (June 2023, Proceedings of ACM SIGMOD)

Full Text Available

« Prev Next »

Search for: All records