NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal Batch-Dynamic kd-trees for Processing-in-Memory with Applications

https://doi.org/10.1145/3694906.3743318

Zhao, Yiwei; Kang, Hongbo; Gu, Yan; Blelloch, Guy E; Dhulipala, Laxman; McGuffey, Charles; Gibbons, Phillip B (July 2025, ACM)

Full Text Available
Parallel kd-tree with Batch Updates

https://doi.org/10.1145/3709712

Men, Ziyang; Shen, Zheqi; Gu, Yan; Sun, Yihan (February 2025, Proceedings of the ACM on Management of Data)

The kd-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in kd-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and updates. The goal of this paper is to develop efficient in-memory kd-trees by supporting high parallelism and cache-efficiency. We propose the Pkd-tree (Parallel kd-tree), a parallel kd-tree that is efficient both in theory and in practice. The Pkd-tree supports parallel tree construction, batch update (insertion and deletion), and various queries including k-nearest neighbor search, range query, and range count. We proved that our algorithms have strong theoretical bounds in work (sequential time complexity), span (parallelism), and cache complexity. Our key techniques include 1) an efficient construction algorithm that optimizes work, span, and cache complexity simultaneously, and 2) reconstruction-based update algorithms that guarantee the tree to be weight-balanced. With the new algorithmic insights and careful engineering effort, we achieved a highly optimized implementation of the Pkd-tree. We tested Pkd-tree with various synthetic and real-world datasets, including both uniform and highly skewed data. We compare the Pkd-tree with state-of-the-art parallel kd-tree implementations. In all tests, with better or competitive query performance, Pkd-tree is much faster in construction and updates consistently than all baselines. We released our code.
more » « less
Full Text Available
Parallel Cluster-BFS and Applications to Shortest Paths

https://doi.org/10.1137/1.9781611978339.4

Wang, Letong; Blelloch, Guy; Gu, Yan; Sun, Yihan (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
New Algorithms for Incremental Minimum Spanning Trees and Temporal Graph Applications

https://doi.org/10.1137/1.9781611978759.22

Ding, Xiangyun; Gu, Yan; Sun, Yihan (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

https://doi.org/10.1145/3626183.3659973

Dhulipala, Laxman; Dong, Xiaojun; Gowda, Kishen N; Gu, Yan (June 2024, Annual ACM Symposium on Parallelism in Algorithms and Architectures)
Brief Announcement: PASGAL: Parallel And Scalable Graph Algorithm Library

https://doi.org/10.1145/3626183.3660258

Dong, Xiaojun; Gu, Yan; Sun, Yihan; Wang, Letong (June 2024, ACM)

Full Text Available
Parallel and (Nearly) Work-Efficient Dynamic Programming

https://doi.org/10.1145/3626183.3659958

Ding, Xiangyun; Gu, Yan; Sun, Yihan (June 2024, ACM)

Full Text Available
Parallel Integer Sort: Theory and Practice

https://doi.org/10.1145/3627535.3638483

Dong, Xiaojun; Dhulipala, Laxman; Gu, Yan; Sun, Yihan (February 2024, ACM)

Full Text Available
ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms

https://doi.org/10.1145/3627535.3638475

Manohar, Magdalen Dobson; Shen, Zheqi; Blelloch, Guy; Dhulipala, Laxman; Gu, Yan; Simhadri, Harsha Vardhan; Sun, Yihan (February 2024, ACM)

Full Text Available
Fast and Space-Efficient Parallel Algorithms for Influence Maximization

https://doi.org/10.14778/3632093.3632104

Wang, Letong; Ding, Xiangyun; Gu, Yan; Sun, Yihan (November 2023, Proceedings of the VLDB Endowment)

Influence Maximization (IM) is a crucial problem in data science. The goal is to find a fixed-size set of highly influentialseedvertices on a network to maximize the influence spread along the edges. While IM is NP-hard on commonly used diffusion models, a greedy algorithm can achieve (1 - 1/e)-approximation by repeatedly selecting the vertex with the highestmarginal gainin influence as the seed. However, we observe two performance issues in the existing work that prevent them from scaling to today's large-scale graphs: space-inefficient memorization to estimate marginal gain, and time-inefficient seed selection process due to a lack of parallelism. This paper significantly improves the scalability of IM using two key techniques. The first is asketch-compressiontechnique for the independent cascading model on undirected graphs. It allows combining the simulation and sketching approaches to achieve a time-space tradeoff. The second technique includes new data structures for parallel seed selection. Using our new approaches, we implementedPaC-IM: Parallel and Compressed IM. We comparePaC-IMwith state-of-the-art parallel IM systems on a 96-core machine with 1.5TB memory.PaC-IMcan process the ClueWeb graph with 978M vertices and 75B edges in about 2 hours. On average, across all tested graphs, our uncompressed version is 5--18x faster and about 1.4x more space-efficient than existing parallel IM systems. Using compression further saves 3.8x space with only 70% overhead in time on average.
more » « less
Full Text Available

Search for: All records