Computing the single-source shortest path (SSSP) is one of the fundamental graph algorithms, and is used in many applications. Here, we focus on computing SSSP on large dynamic graphs, i.e. graphs whose structure evolves with time. We posit that instead of recomputing the SSSP for each set of changes on the dynamic graphs, it is more efficient to update the results based only on the region of change. To this end, we present a novel two-step shared-memory algorithm for updating SSSP on weighted large-scale graphs. The key idea of our algorithm is to identify changes, such as vertex/edge addition and deletion, that affect the shortest path computations and update only the parts of the graphs affected by the change. We provide the proof of correctness of our proposed algorithm. Our experiments on real and synthetic networks demonstrate that our algorithm is as much as 4X faster compared to computing SSSP with Galois, a state-of-the-art parallel graph analysis software for shared memory architectures. We also demonstrate how increasing the asynchrony can lead to even faster updates. To the best of our knowledge, this is one of the first practical parallel algorithms for updating networks on shared-memory systems, that is also scalable to large networks.
more »
« less
Parallel Vertex Color Update on Large Dynamic Networks
We present the first GPU-based parallel algorithm to efficiently update vertex coloring on large dynamic networks. For single GPU, we introduce the concept of loosely maintained vertex color update that reduces computation and memory requirements. For multiple GPUs, in distributed environments, we propose priority-based ordering of vertices to reduce the communication time. We prove the correctness of our algorithms and experimentally demonstrate that for graphs of over 16 million vertices and over 134 million edges on a single GPU, our dynamic algorithm is as much as 20x faster than state-of-the-art algorithm on static graphs. For larger graphs with over 130 million vertices and over 260 million edges, our distributed implementation with 8 GPUs produces updated color assignments within 160 milliseconds. In all cases, the proposed parallel algorithms produce comparable or fewer colors than state-of-the-art algorithms.
more »
« less
- PAR ID:
- 10451238
- Date Published:
- Journal Name:
- HiPC
- Page Range / eLocation ID:
- 115 to 124
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as the core data structures, utilizing equivalent sparse matrix operations to execute graph updates. By reducing unnecessary communication among processes and employing shared-memory parallelism, we accelerate updates of distributed graphs. Additionally, we maintain a balanced load in the output matrix by permuting the resultant matrix during the update process. We demonstrate that our streaming update algorithm is at least 25 times faster than alternative linear-algebraic methods and scales linearly up to 4,096 cores (32 nodes) on a Cray EX supercomputer.more » « less
-
For fast processing of increasingly large graphs, triangle counting - a common building block of graph processing algorithms, is often performed on GPUs. However, applying massive parallelism to triangle counting is challenging due to the algorithm’s inherent irregular access patterns and workload imbalance. In this work, we propose WeTriC, a novel wedge-parallel triangle counting algorithm for GPUs, which, using fine(r)-grained parallelism through a lightweight static mapping of wedges to threads, improves load balancing and efficiency. Our theoretical analysis compares different parallelization granularities, while optimizations enhance caching, reduce work-per-intersection, and minimize overhead. Performance experiments indicate that WeTriC yields 5.63x and 4.69x speedup over optimized vertex-parallel and edge-parallel binary search triangle counting algorithms, respectively. Furthermore, we show that WeTriC consistently outperforms the state-of-the-art (i.e., on avg. 2.86x faster than Trust and 2.32x faster than GroupTC).more » « less
-
In this paper, we propose a novel method, GSM, to compute graph matching (subgraph isomorphism) on GPUs. Unlike previous formulations of graph matching, our approach is BFS-based and thus can be mapped onto GPUs in a massively parallel fashion. Our implementation uses the Gunrock program- ming model and we evaluate our implementation in runtime and memory consumption compared with previous state-of-the- art work. We sustain a peak traversed-edges-per-second (TEPS) rate of nearly 10 GTEPS. Our algorithm is the most scalable and parallel among all existing GPU implementations and also outperforms all existing CPU distributed implementations. This work specifically focuses on leveraging our implementation on the triangle counting problem for the Subgraph Isomorphism Graph Challenge, demonstrating a geometric mean speedup over the 2018 champion of 3.84xmore » « less
-
The transitive closure of a graph is a new graph where every vertex is directly connected to all vertices to which it had a path in the original graph. Transitive closures are useful for reachability and relationship querying. Finding the transitive closure can be computationally expensive and requires a large memory footprint as the output is typically larger than the input. Some of the original research on transitive closures assumed that graphs were dense and used dense adjacency matrices. We have since learned that many real-world networks are extremely sparse, and the existing methods do not scale. In this work, we introduce a new algorithm called Anti-section Transitive Closure (ATC) for finding the transitive closure of a graph. We present a new parallel edges operation – anti-sections – for finding new edges to reachable vertices. ATC scales to massively multithreaded systems such as NVIDIA’s GPU with tens of thousands of threads. We show that the anti-section operation shares some traits with the triangle counting intersection operation in graph analysis. Lastly, we view the transitive closure problem as a dynamic graph problem requiring edge insertions. By doing this, our memory footprint is smaller. We also show a method for creating the batches in parallel using two different techniques: dual-round and hash. Using these techniques and the Hornet dynamic graph data structure, we show our new algorithm on an NVIDIA Titan V GPU. We compare with other packages such as NetworkX, SEI-GBTL, SuiteSparse, and cuSparse.more » « less
An official website of the United States government

