NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LibRTS: A Spatial Indexing Library by Ray Tracing

https://doi.org/10.1145/3710848.3710850

Geng, Liang; Lee, Rubao; Zhang, Xiaodong (February 2025, ACM)

The Ray-Tracing (RT) core has become a widely integrated feature in modern GPUs to accelerate ray-tracing rendering. Recent research has shown that RT cores can also be repurposed to accelerate non-rendering workloads. Since the RT core essentially serves as a hardware accelerator for Bounding Volume Hierarchy (BVH) tree traversal, it holds the potential to significantly improve the performance of spatial workloads. However, the specialized RT programming model poses challenges for using RT cores in these scenarios. Inspired by the core functionality of RT cores, we designed and implemented LibRTS, a spatial index library that leverages RT cores to accelerate spatial queries. LibRTS supports both point and range queries and remains mutable to accommodate changing data. Instead of relying on a case-by-case approach, LibRTS provides a general, highperformance spatial indexing framework for spatial data processing. By formulating spatial queries as RT-suitable problems and overcoming load-balancing challenges, LibRTS delivers superior query performance through RT cores without requiring developers to master complex programming on this specialized hardware. Compared to CPU and GPU spatial libraries, LibRTS achieves speedups of up to 85.1x for point queries, 94.0x for range-contains queries, and 11.0x for range-intersects queries. In a real-world application, pointin-polygon testing, LibRTS also surpasses the state-of-the-art RT method by up to 3.8x.
more » « less
Free, publicly-accessible full text available February 28, 2026
RayJoin: Fast and Precise Spatial Join

https://doi.org/10.1145/3650200.3656610

Geng, Liang; Lee, Rubao; Zhang, Xiaodong (May 2024, https://doi.org/10.1145/3650200.3656610)

Full Text Available
RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy Interface

https://doi.org/10.1109/TPDS.2024.3404394

Geng, Liang; Wang, Hao; Meng, Jingsong; Fan, Dayi; Ben-Romdhane, Sami; Pichumani, Hari Kadayam; Phegade, Vinay; Zhang, Xiaodong (August 2024, IEEE Transactions on Parallel and Distributed Systems)

We have developed an open-source software called RR-Compound for low latency, high throughput, and easy interface for users.
more » « less
Full Text Available
An RDMA-enabled In-memory Computing Platform for R-tree on Clusters

https://doi.org/10.1145/3503513

Xiao, Mengbai; Wang, Hao; Geng, Liang; Lee, Rubao; Zhang, Xiaodong (June 2022, ACM Transactions on Spatial Algorithms and Systems)

R-tree is a foundational data structure used in spatial databases and scientific databases. With the advancement of networks and computer architectures, in-memory data processing for R-tree in distributed systems has become a common platform. We have observed new performance challenges to process R-tree as the amount of multidimensional datasets become increasingly high. Specifically, an R-tree server can be heavily overloaded while the network and client CPU are lightly loaded, and vice versa. In this article, we present the design and implementation of Catfish, an RDMA-enabled R-tree for low latency and high throughput by adaptively utilizing the available network bandwidth and computing resources to balance the workloads between clients and servers. We design and implement two basic mechanisms of using RDMA for a client-server R-tree data processing system. First, in the fast messaging design, we use RDMA writes to send R-tree requests to the server and let server threads process R-tree requests to achieve low query latency. Second, in the RDMA offloading design, we use RDMA reads to offload tree traversal from the server to the client, which rescues the server as it is overloaded. We further develop an adaptive scheme to effectively switch an R-tree search between fast messaging and RDMA offloading, maximizing the overall performance. Our experiments show that the adaptive solution of Catfish on InfiniBand significantly outperforms R-tree that uses only fast messaging or only RDMA offloading in both latency and throughput. Catfish can also deliver up to one order of magnitude performance over the traditional schemes using TCP/IP on 1 and 40 Gbps Ethernet. We make a strong case to use RDMA to effectively balance workloads in distributed systems for low latency and high throughput.
more » « less
Full Text Available
Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing

Wang, Qiange; Zhang, Yanfeng; Wang, Hao; Geng, Liang; Lee, Rubao; Zhang, Xiaodong; Yu, Ge (June 2020, Proceedings of ACM SIGMOD Conference on Management of Data)
null (Ed.)
In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental compuping and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution.In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is mono-tonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these pro-grams for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs.
more » « less
Full Text Available

Search for: All records