Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This article focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count efficiently. We design two new parallel five-cycle counting algorithms and prove that they are work efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. Additionally, we present new parallel algorithms for obtaining unbiased estimates of five-cycle counts using graph sparsification. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our best exact parallel algorithm achieves 10–46× self-relative speedup, outperforms our serial benchmarks by 10–32×, and outperforms the previous state-of-the-art serial algorithm by up to 818×. Our best approximate algorithm, for a reasonable probability parameter, achieves up to 20× self-relative speedup and is able to approximate five-cycle counts 9–189× faster than our best exact algorithm, with between 0.52% and 11.77% error.
more »
« less
A parallel algorithm for generating a random graph with a prescribed degree sequence
Random graphs (or networks) have gained a significant increase of interest due to its popularity in modeling and simulating many complex real-world systems. Degree sequence is one of the most important aspects of these systems. Random graphs with a given degree sequence can capture many characteristics like dependent edges and non-binomial degree distribution that are absent in many classical random graph models such as the Erdöos-Rényi graph model. In addition, they have important applications in uniform sampling of random graphs, counting the number of graphs having the same degree sequence, as well as in string theory, random matrix theory, and matching theory. In this paper, we present an OpenMP-based shared-memory parallel algorithm for generating a random graph with a prescribed degree sequence, which achieves a speedup of 20.4 with 32 cores. We also present a comparative study of several structural properties of the random graphs generated by our algorithm with that of the real-world graphs and random graphs generated by other popular methods. One of the steps in our parallel algorithm requires checking the Erdöos-Gallai characterization, i.e., whether there exists a graph obeying the given degree sequence, in parallel. This paper presents a non-trivial parallel algorithm for checking the Erdöos-Gallai characterization, which achieves a speedup of 23 with 32 cores.
more »
« less
- Award ID(s):
- 1633028
- PAR ID:
- 10067732
- Date Published:
- Journal Name:
- 2017 IEEE International Conference on Big Data (Big Data)
- Page Range / eLocation ID:
- 3312 to 3321
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Finding connected components in a graph is a fundamental problem in graph analysis. In this work, we present a novel minimum-mapping based Contour algorithm to efficiently solve the connectivity problem. We prove that the Contour algorithm with two or higher order operators can identify all connected components of an undirected graph within O(log d_max) iterations, with each iteration involving O(m) work, where d_max represents the largest diameter among all components in the given graph, and m is the total number of edges in the graph. Importantly, each iteration is highly parallelizable, making use of the efficient minimum-mapping operator applied to all edges. To further enhance its practical performance, we optimize the Contour algorithm through asynchronous updates, early convergence checking, eliminating atomic operations, and choosing more efficient mapping operators. Our implementation of the Contour algorithm has been integrated into the open-source framework Arachne. Arachne extends Arkouda for large-scale interactive graph analytics, providing a Python API powered by the high-productivity parallel language Chapel. Experimental results on both real-world and synthetic graphs demonstrate the superior performance of our proposed Contour algorithm compared to state-of-the-art large-scale parallel algorithm FastSV and the fastest shared memory algorithm ConnectIt. On average, Contour achieves a speedup of 7.3x and 1.4x compared to FastSV and ConnectIt, respectively. All code for the Contour algorithm and the Arachne framework is publicly available on GitHub {https://github.com/Bears-R-Us/arkouda-njit), ensuring transparency and reproducibility of our work.more » « less
-
Finding connected components in a graph is a fundamental problem in graph analysis. In this work, we present a novel minimum-mapping based Contour algorithm to efficiently solve the connectivity problem. We prove that the Contour algorithm with two or higher order operators can identify all connected components of an undirected graph within O(log d_max) iterations, with each iteration involving O(m) work, where d_max represents the largest diameter among all components in the given graph, and m is the total number of edges in the graph. Importantly, each iteration is highly parallelizable, making use of the efficient minimum-mapping operator applied to all edges. To further enhance its practical performance, we optimize the Contour algorithm through asynchronous updates, early convergence checking, eliminating atomic operations, and choosing more efficient mapping operators. Our implementation of the Contour algorithm has been integrated into the open-source framework Arachne. Arachne extends Arkouda for large-scale interactive graph analytics, providing a Python API powered by the high-productivity parallel language Chapel. Experimental results on both real-world and synthetic graphs demonstrate the superior performance of our proposed Contour algorithm compared to state-of-the-art large-scale parallel algorithm FastSV and the fastest shared memory algorithm ConnectIt. On average, Contour achieves a speedup of 7.3x and 1.4x compared to FastSV and ConnectIt, respectively. All code for the Contour algorithm and the Arachne framework is publicly available on GitHub {https://github.com/Bears-R-Us/arkouda-njit), ensuring transparency and reproducibility of our work.more » « less
-
Counting and finding triangles in graphs is often used in real-world analytics to characterize cohesiveness and identify communities in graphs. In this paper, we propose the novel concept of a cover-edge set that can be used to find triangles more efficiently. We use a breadth-first search (BFS) to quickly generate a compact cover-edge set. Novel sequential and parallel triangle counting algorithms are presented that employ cover-edge sets. The sequential algorithm avoids unnecessary triangle-checking operations, and the parallel algorithm is communication-efficient. The parallel algorithm can asymptotically reduce communication on massive graphs such as from real social networks and synthetic graphs from the Graph500 Benchmark. In our estimate from massive-scale Graph500 graphs, our new parallel algorithm can reduce the communication on a scale 36 graph by 1156x and on a scale 42 graph by 2368x.more » « less
-
Counting and finding triangles in graphs is often used in real-world analytics to characterize cohesiveness and identify communities in graphs. In this paper, we propose the novel concept of a cover-edge set that can be used to find triangles more efficiently. We use a breadth-first search (BFS) to quickly generate a compact cover-edge set. Novel sequential and parallel triangle counting algorithms are presented that employ cover-edge sets. The sequential algorithm avoids unnecessary triangle-checking operations, and the parallel algorithm is communication-efficient. The parallel algorithm can asymptotically reduce communication on massive graphs such as from real social networks and synthetic graphs from the Graph500 Benchmark. In our estimate from massive-scale Graph500 graphs, our new parallel algorithm can reduce the communication on a scale 36 graph by 1156x and on a scale 42 graph by 2368x.more » « less
An official website of the United States government

