NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Case Study for Ray Tracing Cores: Performance Insights with Breadth-First Search and Triangle Counting in Graphs

https://doi.org/10.1145/3727108

Xiao, Zhixiong; Xiao, Mengbai; Yuan, Yuan; Yu, Dongxiao; Lee, Rubao; Zhang, Xiaodong (May 2025, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

The emerging Ray-tracing cores on GPUs have been repurposed for non-ray-tracing tasks by researchers recently. In this paper, we explore the benefits and effectiveness of executing graph algorithms on RT cores. We re-design breadth-first search and triangle counting on the new hardware as graph algorithm representatives. Our implementations focus on how to convert the graph operations to bounding volume hierarchy construction and ray generation, which are computational paradigms specific to ray tracing. We evaluate our RT-based methods on a wide range of real-world datasets. The results do not show the advantage of the RT-based methods over CUDA-based methods. We extend the experiments to the set intersection workload on synthesized datasets, and the RT-based method shows superior performance when the skew ratio is high. By carefully comparing the RT-based and CUDA-based binary search, we discover that RT cores are more efficient at searching for elements, but this comes with a constant and non-trivial overhead of the execution pipeline. Furthermore, the overhead of BVH construction is substantially higher than sorting on CUDA cores for large datasets. Our case studies unveil several rules of adapting graph algorithms to ray-tracing cores that might benefit future evolution of the emerging hardware towards general-computing tasks.
more » « less
Free, publicly-accessible full text available May 27, 2026
LibRTS: A Spatial Indexing Library by Ray Tracing

https://doi.org/10.1145/3710848.3710850

Geng, Liang; Lee, Rubao; Zhang, Xiaodong (February 2025, ACM)

The Ray-Tracing (RT) core has become a widely integrated feature in modern GPUs to accelerate ray-tracing rendering. Recent research has shown that RT cores can also be repurposed to accelerate non-rendering workloads. Since the RT core essentially serves as a hardware accelerator for Bounding Volume Hierarchy (BVH) tree traversal, it holds the potential to significantly improve the performance of spatial workloads. However, the specialized RT programming model poses challenges for using RT cores in these scenarios. Inspired by the core functionality of RT cores, we designed and implemented LibRTS, a spatial index library that leverages RT cores to accelerate spatial queries. LibRTS supports both point and range queries and remains mutable to accommodate changing data. Instead of relying on a case-by-case approach, LibRTS provides a general, highperformance spatial indexing framework for spatial data processing. By formulating spatial queries as RT-suitable problems and overcoming load-balancing challenges, LibRTS delivers superior query performance through RT cores without requiring developers to master complex programming on this specialized hardware. Compared to CPU and GPU spatial libraries, LibRTS achieves speedups of up to 85.1x for point queries, 94.0x for range-contains queries, and 11.0x for range-intersects queries. In a real-world application, pointin-polygon testing, LibRTS also surpasses the state-of-the-art RT method by up to 3.8x.
more » « less
Free, publicly-accessible full text available February 28, 2026
High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to Now

https://doi.org/10.14778/3685800.3685912

Wang, Fusheng; Lee, Rubao; Teng, Dejun; Zhang, Xiaodong; Saltz, Joel (August 2024, Proceedings of the VLDB Endowment)

We released open-source software Hadoop-GIS in 2011, and presented and published the work in VLDB 2013. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. In this paper, we provide the background on why we started this project and give an overview of the original Hadoop-GIS software architecture, along with its unique technical contributions and legacy. We present the evolution of the ecosystem and its current state-of-the-art, which has been influenced by the Hadoop-GIS project. We also describe the ongoing efforts to further enhance this ecosystem with hardware accelerations to meet the increasing demands for low latency and high throughput in various spatial data analysis tasks. Finally, we will summarize the insights gained and lessons learned over more than a decade in pursuing high-performance spatial data analytics.
more » « less
Full Text Available
RayJoin: Fast and Precise Spatial Join

https://doi.org/10.1145/3650200.3656610

Geng, Liang; Lee, Rubao; Zhang, Xiaodong (May 2024, https://doi.org/10.1145/3650200.3656610)

Full Text Available
X-TED: Massive Parallelization of Tree Edit Distance

https://doi.org/10.14778/3654621.3654634

Fan, Dayi; Lee, Rubao; Zhang, Xiaodong (March 2024, Proceedings of the VLDB Endowment)

The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources. Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x.
more » « less
Full Text Available
RTScan: Efficient Scan with Ray Tracing Cores

https://doi.org/10.14778/3648160.3648183

Lv, Yangming; Zhang, Kai; Wang, Ziming; Zhang, Xiaodong; Lee, Rubao; He, Zhenying; Jing, Yinan; Wang, X Sean (February 2024, Proceedings of the VLDB Endowment)

Indexing is a core technique for accelerating predicate evaluation in databases. After many years of effort, the indexing performance has reached its peak on the existing hardware infrastructure. We propose to use ray tracing (RT) cores to move the indexing performance and efficiency to another level by addressing the following technical challenges: (1) the lack of an efficient mapping of predicate evaluation to a ray tracing job and (2) the poor performance by the heavy and imbalanced ray load when processing skewed datasets. These challenges set obstacles to effectively exploiting RT cores for predicate evaluation. In this paper, we propose RTScan, an approach that leverages RT cores to accelerate index scans. RTScan transforms the evaluation of conjunctive predicates into an efficient ray tracing job in a three-dimensional space. A set of techniques are designed in RTScan, i.e., Uniform Encoding, Data Sieving, and Matrix RT Refine, which significantly enhances the parallelism of scans on RT cores while lightening and balancing the ray load. With the proposed techniques, RTScan achieves high performance for datasets with either uniform or skewed distributions and queries with different selectivities. Extensive evaluations demonstrate that RTScan enhances the scan performance on RT cores by five orders of magnitude and outperforms the state-of-the-art approach on CPU by up to 4.6×.
more » « less
Full Text Available
Maze: A Cost-Efficient Video Deduplication System at Web-scale

https://doi.org/10.1145/3503161.3548145

Qin, An; Xiao, Mengbai; Huang, Ben; Zhang, Xiaodong (January 2022, MM '22: Proceedings of the 30th ACM International Conference on Multimedia)

With the advancement and dominant service of Internet videos, the content-based video deduplication system becomes an essential and dependent infrastructure for Internet video service. However, the explosively growing video data on the Internet challenges the system design and implementation for its scalability in several ways. (1) Although the quantization-based indexing techniques are effective for searching visual features at a large scale, the costly re-training over the complete dataset must be done periodically. (2) The high-dimensional vectors for visual features demand increasingly large SSD space, degrading I/O performance. (3) Videos crawled from the Internet are diverse, and visually similar videos are not necessarily the duplicates, increasing deduplication complexity. (4) Most videos are edited ones. The duplicate contents are more likely discovered as clips inside the videos, demanding processing techniques with close attention to details. To address above-mentioned issues, we propose Maze, a full-fledged video deduplication system. Maze has an ANNS layer that indexes and searches the high dimensional feature vectors. The architecture of the ANNS layer supports efficient reads and writes and eliminates the data migration caused by re-training. Maze adopts the CNN-based feature and the ORB feature as the visual features, which are optimized for the specific video deduplication task. The features are compact and fully reside in the memory. Acoustic features are also incorporated in Maze so that the visually similar videos but having different audio tracks are recognizable. A clip-based matching algorithm is developed to discover duplicate contents at a fine granularity. Maze has been deployed as a production system for two years. It has indexed 1.3 billion videos and is indexing ~800 thousand videos per day. For the ANNS layer, the average read latency is 4 seconds and the average write latency is at most 4.84 seconds. The re-training over the complete dataset is no longer required no matter how many new data sets are added, eliminating the costly data migration between nodes. Maze recognizes the duplicate live streaming videos with both the similar appearance and the similar audio at a recall of 98%. Most importantly, Maze is also cost-effective. For example, the compact feature design helps save 5800 SSDs and the computation resources devoted to running the whole system decrease to 250K standard cores per billion videos.
more » « less
Full Text Available

Search for: All records