skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2227669

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract In the pre-big data era, many traditional databases supported spatial queries via spatial indexes. However, modern applications are seeing a rapid increase of the volume and ingestion rate of spatial data. Log-structured Merge (LSM) tree is used by many big data systems as their storage structure in order to support write-intensive large-volume workloads, which are usually only optimized for single-dimensional data. Research has studied how spatial indexes can be supported on LSM systems, but focused mainly on the local index organization, that is, how data is organized inside a single LSM component. This paper studies various aspects of LSM spatial indexing, including spatial merge policies, which determine when and how spatial components are merged. Three stack-based and one leveled merge policies have been studied, which have been implemented on a common big data system Apache AsterixDB. The write and read performance on various workloads is evaluated, and our findings and recommendations are discussed. A key finding is that Leveled policies underperform other stack-based merge policies for most types of spatial workloads. 
    more » « less
  2. Free, publicly-accessible full text available July 16, 2026
  3. This paper proposes efficient solutions for k-core decomposition with high parallelism. The problem of k-core decomposition is fundamental in graph analysis and has applications across various domains. However, existing algorithms face significant challenges in achieving work-efficiency in theory and/or high parallelism in practice, and suffer from various performance bottlenecks. We present a simple, work-efficient parallel framework for k-core decomposition that is easy to implement and adaptable to various strategies for improving work-efficiency. We introduce two techniques to enhance parallelism: a sampling scheme to reduce contention on high-degree vertices, and vertical granularity control (VGC) to mitigate scheduling overhead for low-degree vertices. Furthermore, we design a hierarchical bucket structure to optimize performance for graphs with high coreness values. We evaluate our algorithm on a diverse set of real-world and synthetic graphs. Compared to state-of-the-art parallel algorithms, including ParK, PKC, and Julienne, our approach demonstrates superior performance on 23 out of 25 graphs when tested on a 96-core machine. Our algorithm shows speedups of up to 315× over ParK, 33.4× over PKC, and 52.5× over Julienne. 
    more » « less
    Free, publicly-accessible full text available June 17, 2026
  4. Free, publicly-accessible full text available June 8, 2026
  5. The kd-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in kd-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and updates. The goal of this paper is to develop efficient in-memory kd-trees by supporting high parallelism and cache-efficiency. We propose the Pkd-tree (Parallel kd-tree), a parallel kd-tree that is efficient both in theory and in practice. The Pkd-tree supports parallel tree construction, batch update (insertion and deletion), and various queries including k-nearest neighbor search, range query, and range count. We proved that our algorithms have strong theoretical bounds in work (sequential time complexity), span (parallelism), and cache complexity. Our key techniques include 1) an efficient construction algorithm that optimizes work, span, and cache complexity simultaneously, and 2) reconstruction-based update algorithms that guarantee the tree to be weight-balanced. With the new algorithmic insights and careful engineering effort, we achieved a highly optimized implementation of the Pkd-tree. We tested Pkd-tree with various synthetic and real-world datasets, including both uniform and highly skewed data. We compare the Pkd-tree with state-of-the-art parallel kd-tree implementations. In all tests, with better or competitive query performance, Pkd-tree is much faster in construction and updates consistently than all baselines. We released our code. 
    more » « less
    Free, publicly-accessible full text available February 10, 2026