Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Large-scale graph analytics has become increasingly common in areas like social networks, physical sciences, transportation networks, and recommendation systems. Since many such practical graphs do not fit in main memory, graph analytics performance depends on efficiently utilizing underlying storage devices. These out-of-core graph processing systems employ sharding and sub-graph partitioning to optimize for storage while relying on efficient sequential access of traditional hard disks. However, today's storage is increasingly based on solid-state drives (SSDs) that exhibit high internal parallelism and efficient random accesses. Yet, state-of-the-art graph processing systems do not explicitly exploit those properties, resulting in subpar performance. In this paper, we develop CAVE, the first graph processing engine that optimally exploits underlying SSD-based storage by harnessing the available storage device parallelism via carefully selecting which I/Os to graph data can be issued concurrently. Thus, CAVE traverses multiple paths and processes multiple nodes and edges concurrently, achieving parallelization at a granular level. We identify two key ways to parallelize graph traversal algorithms based on the graph structure and algorithm: intra-subgraph and inter-subgraph parallelization. The first identifies subgraphs that contain vertices that can be accessed in parallel, while the latter identifies subgraphs that can be processed in their entirety in parallel. To showcase the benefit of our approach, we build within CAVE parallelized versions of five popular graph algorithms (Breadth-First Search, Depth-First Search, Weakly Connected Components, PageRank, Random Walk) that exploit the full bandwidth of the underlying device. CAVE uses a blocked file format based on adjacency lists and employs a concurrent cache pool that is essential to the parallelization of graph algorithms. By experimenting with different types of graphs on three SSD devices, we demonstrate that CAVE utilizes the available parallelism, and scales to diverse real-world graph datasets. CAVE achieves up to one order of magnitude speedup compared to the popular out-of-core systems Mosaic and GridGraph, and up to three orders of magnitude speedup in runtime compared to GraphChi.more » « less
-
Data-intensive applications have fueled the evolution oflog-structured merge (LSM)based key-value engines that employ theout-of-placeparadigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost oftreating deletes as second-class citizens. A delete operation inserts atombstonethat invalidates older instances of the deleted key. State-of-the-art LSM-engines do not provide guarantees as to how fast a tombstone will propagate topersist the deletion. Further, LSM-engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written, leading to undesired latency spikes and increasing the overall operational cost of a database. Efficient and persistent deletion is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on data deletion, and (iii)en massecloud deployment of data systems. Further, we document that LSM-based key-value engines perform suboptimally in the presence of deletes in a workload. Tombstone-driven logical deletes, by design, are unable to purge the deleted entries in a timely manner, and retaining the invalidated entries perpetually affects the overall performance of LSM-engines in terms of space amplification, write amplification, and read performance. Moreover, the potentially unbounded latency for persistent deletes brings in critical privacy concerns in light of the data privacy protection regulations, such as theright to be forgottenin EU’s GDPR, theright to deletein California’s CCPA and CPRA, anddeletion rightin Virginia’s VCDPA. Toward this, we introduce the delete design space for LSM-trees and highlight the performance implications of the different classes of delete operations. To address these challenges, in this article, we build a new key-value storage engine,Lethe+, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show thatLethe+supports any user-defined threshold for the delete persistence latency offeringhigher read throughput(1.17× -1.4×) andlower space amplification(2.1× -9.8×), with a modest increase in write amplification (between 4% and 25%) that can be further amortized to less than 1%. In addition,Lethe+supports efficient range deletes on asecondary delete keyby dropping entire data pages without sacrificing read performance or employing a costly full tree merge.more » « less
-
A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decadelong journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: “What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?” To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy, allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.more » « less
-
This paper outlines the vision for a new type of software-shaped platforms, or SOSH platforms for short, that can be implemented in commercial CPU+FPGA platforms. At the core of the SOSH paradigm is the idea of exposing direct control over the flow of data exchanged between hardware components in embedded Systemon-Chips (SoC). Data flow manipulation primitives are synthesized in reprogrammable hardware and interposed between central processors, memory modules, and I/O devices. A new layer of system software is then introduced to leverage such primitives and to achieve fine-grained control and introspection over the interaction of SoC resources. By turning memory and I/O data flows into manageable entities, a new degree of internal awareness can be achieved in complex systems. We first review recent works that are well aligned with the concept of data flow manipulation primitives that can be deployed in SOSH platforms. Next, we outline future research avenues concerning the use of the SOSH paradigm for workload profiling and prediction, to implement advanced memory models, and to perform security threat identification and mitigation.more » « less
-
Transactional and analytical database management systems (DBMS) typically employ different data layouts: row-stores for the first and column-stores for the latter. In order to bridge the requirements of the two without maintaining two systems and two (or more) copies of the data, our proposed system Relational Memory employs specialized hardware that transforms the base row table into arbitrary column groups at query execution time. This approach maximizes the cache locality and is easy to use via a simple abstraction that allows transparent on-the-fly data transformation. Here, we demonstrate how to deploy and use Relational Memory via four representative scenarios. The demonstration uses the full-stack implementation of Relational Memory on the Xilinx Zynq UltraScale+ MPSoC platform. Conference participants will interact with Relational Memory deployed in the actual platform.more » « less
-
A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a profound impact on the entire data system architecture. The multiple-decadelong journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: “What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?” To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore’s law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage component to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which has a profound impact on the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization, and ultimately better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware, as well as the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.more » « less
-
Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming rows to columns at runtime is expensive, hence, many analytical systems ingest data in row-first form and transform it in the background to columns to facilitate future analytical queries. How will this design change if we can always efficiently access only the desired set of columns? To address this question, we present a radically new approach to data transformation from rows to columns. We build upon recent advancements in embedded platforms with re-programmable logic to design native in-memory access on rows and columns. Our approach, termed Relational Memory (RM), relies on an FPGA-based accelerator that sits between the CPU and main memory and transparently transforms base data to any group of columns with minimal overhead at runtime. This design allows accessing any group of columns as if it already exists in memory. We implement and deploy RM in real hardware, and we show that we can access the desired columns up to 1.63× faster compared to a row-wise layout, while matching the performance of pure columnar access for low projectivity, and outperforming it by up to 2.23× as projectivity (and tuple reconstruction cost) increases. Overall, RM allows the CPU to access the optimal data layout, radically reducing unnecessary data movement without high data transformation costs, thus, simplifying software complexity and physical design, while accelerating query execution.more » « less