NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

High-Performance Computing for Graph AI: A Top-Down Perspective

Ji, Yuede (June 2025, IEEE)

A graph, made up of vertices and edges, is a natural representation for many real-world applications. Graph artificial intelligence (AI) techniques, especially graph neural networks (GNNs), are becoming increasingly important in modern machine learning and data analysis, as they can accurately represent high- dimensional features of vertices, edges, and structure information into low-dimensional embeddings. They have become a valuable area of study for students in fields like computer science, data science, and AI. However, the students are facing two challenges to grasp the knowledge of GNNs, including (i) learning GNNs often requires multidiscipline knowledge, and (ii) resources for learning GNNs are often fragmented. Motivated by that, we designed a self-contained course module on high-performance computing for graph AI: from a top-down perspective based on our study in this area for the past years. In particular, we divide them into four levels from the top to the bottom, including (i) level 1: graph theory basics, (ii) level 2: fundamental theories of GNNs, (iii) level 3: efficient graph AI computation framework, and (iv) level 4: GPU architecture and programming. In addition, we have disseminated part of this module into different educational activities, such as courses and tutorials. This paper is submitted for the Research to Education track of EduPar-25.
more » « less
Free, publicly-accessible full text available June 4, 2026
Optimized Parallel Breadth-First Search with Adaptive Strategies

https://doi.org/10.1145/3711708.3723449

Li, Chaoqun; Hu, Runbang; Du, Xiaojiang; Ji, Yuede (March 2025, ACM)

Breadth-First Search (BFS) is a fundamental graph traversal algorithm in a level-by-level pattern. It has been widely used in real-world applications, such as social network analysis, scientific computing, and web crawling. However, achieving high performance for BFS on large-scale graphs remains a challenging task due to irregular memory access patterns, diverse graph structures, and the necessity for efficient parallelization. This paper addresses these challenges by designing a highly optimized parallel BFS implementation based on the top-down and bottom-up traversal strategies. It further integrates several key innovations, including graph typea-ware computation strategy selection, graph pruning, twolevel bottom-up, and efficient parallel implementation. We evaluate our method on 11 diverse graphs in terms of size, diameter, and density. On a CPU server with 48 threads, our method achieves an average speedup of 9.5x over the serial BFS implementation. Also, on a synthetic dense graph, our method processes 9.3 billion edges per second, showing its efficiency in dense graph traversal.
more » « less
Free, publicly-accessible full text available March 1, 2026
Adaptive Optimizations for Parallel Single-Source Shortest Paths

https://doi.org/10.1145/3711708.3723450

Hu, Runbang; Li, Chaoqun; Du, Xiaojiang; Ji, Yuede (March 2025, ACM)

The single-source shortest path (SSSP) problem is essential in graph theory with applications in navigation, biology, social networks, and traffic analysis. The -Stepping algorithm enhances parallelism by grouping vertices into "buckets" based on their tentative distances. However, its performance depends on values and graph properties. This paper introduces an adaptive parallel Delta-Stepping implementation with three innovations: neighbor reordering, bucket fusion, and graph type-aware selection. Tested on 11 diverse graphs, it achieves an average 7.1× speedup over serial Dijkstra’s algorithm on a 48-thread CPU server.
more » « less
Free, publicly-accessible full text available March 1, 2026

Search for: All records