skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2508118

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. A graph, made up of vertices and edges, is a natural representation for many real-world applications. Graph artificial intelligence (AI) techniques, especially graph neural networks (GNNs), are becoming increasingly important in modern machine learning and data analysis, as they can accurately represent high- dimensional features of vertices, edges, and structure information into low-dimensional embeddings. They have become a valuable area of study for students in fields like computer science, data science, and AI. However, the students are facing two challenges to grasp the knowledge of GNNs, including (i) learning GNNs often requires multidiscipline knowledge, and (ii) resources for learning GNNs are often fragmented. Motivated by that, we designed a self-contained course module on high-performance computing for graph AI: from a top-down perspective based on our study in this area for the past years. In particular, we divide them into four levels from the top to the bottom, including (i) level 1: graph theory basics, (ii) level 2: fundamental theories of GNNs, (iii) level 3: efficient graph AI computation framework, and (iv) level 4: GPU architecture and programming. In addition, we have disseminated part of this module into different educational activities, such as courses and tutorials. This paper is submitted for the Research to Education track of EduPar-25. 
    more » « less
    Free, publicly-accessible full text available June 4, 2026
  2. Breadth-First Search (BFS) is a fundamental graph traversal algorithm in a level-by-level pattern. It has been widely used in real-world applications, such as social network analysis, scientific computing, and web crawling. However, achieving high performance for BFS on large-scale graphs remains a challenging task due to irregular memory access patterns, diverse graph structures, and the necessity for efficient parallelization. This paper addresses these challenges by designing a highly optimized parallel BFS implementation based on the top-down and bottom-up traversal strategies. It further integrates several key innovations, including graph typea-ware computation strategy selection, graph pruning, twolevel bottom-up, and efficient parallel implementation. We evaluate our method on 11 diverse graphs in terms of size, diameter, and density. On a CPU server with 48 threads, our method achieves an average speedup of 9.5x over the serial BFS implementation. Also, on a synthetic dense graph, our method processes 9.3 billion edges per second, showing its efficiency in dense graph traversal. 
    more » « less
    Free, publicly-accessible full text available March 1, 2026
  3. The single-source shortest path (SSSP) problem is essential in graph theory with applications in navigation, biology, social networks, and traffic analysis. The  -Stepping algorithm enhances parallelism by grouping vertices into "buckets" based on their tentative distances. However, its performance depends on values and graph properties. This paper introduces an adaptive parallel Delta-Stepping implementation with three innovations: neighbor reordering, bucket fusion, and graph type-aware selection. Tested on 11 diverse graphs, it achieves an average 7.1× speedup over serial Dijkstra’s algorithm on a 48-thread CPU server. 
    more » « less
    Free, publicly-accessible full text available March 1, 2026