skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Finding and counting tree-like subgraphs using MapReduce
Several variants of the subgraph isomorphism problem, e.g., finding, counting and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of 107-108 edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.  more » « less
Award ID(s):
1633028 1405697
PAR ID:
10067733
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Multi-Scale Computing Systems
ISSN:
2332-7766
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We present the first polynomial-time algorithm to exactly compute the number of labeled chordal graphs on n vertices. Our algorithm solves a more general problem: given n and ω as input, it computes the number of ω-colorable labeled chordal graphs on n vertices, using O(n⁷) arithmetic operations. A standard sampling-to-counting reduction then yields a polynomial-time exact sampler that generates an ω-colorable labeled chordal graph on n vertices uniformly at random. Our counting algorithm improves upon the previous best result by Wormald (1985), which computes the number of labeled chordal graphs on n vertices in time exponential in n. An implementation of the polynomial-time counting algorithm gives the number of labeled chordal graphs on up to 30 vertices in less than three minutes on a standard desktop computer. Previously, the number of labeled chordal graphs was only known for graphs on up to 15 vertices. 
    more » « less
  2. This paper introduces a novel, parallel, and scalable implementation of the VF2 algorithm for subgraph monomorphism developed in the high-productivity language Chapel. Efficient graph analysis in large and complex network datasets is crucial across numerous scientific domains. We address this need through our enhanced VF2 implementation, widely utilized in subgraph matching, and integrating it into Arachne—a Python-accessible, open-source, large-scale graph analysis framework. Leveraging the parallel computing capabilities of modern hardware architectures, our implementation achieves significant performance improvements. Benchmarks on synthetic and real-world datasets, including social, communication, and neuroscience networks, demonstrate speedups of up to 97X on 128 cores, compared to existing Python-based tools like NetworkX and DotMotif, which do not exploit parallelization. Our results on large-scale graphs demonstrate scalability and efficiency, establishing it as a viable tool for subgraph monomorphism, the backbone of numerous graph analytics such as motif counting and enumeration. Arachne, including our VF2 implementation, can be found on GitHub: https://github.com/Bears-R-Us/arkouda-njit. 
    more » « less
  3. The single-source shortest path (SSSP) problem is essential in graph theory with applications in navigation, biology, social networks, and traffic analysis. The  -Stepping algorithm enhances parallelism by grouping vertices into "buckets" based on their tentative distances. However, its performance depends on values and graph properties. This paper introduces an adaptive parallel Delta-Stepping implementation with three innovations: neighbor reordering, bucket fusion, and graph type-aware selection. Tested on 11 diverse graphs, it achieves an average 7.1× speedup over serial Dijkstra’s algorithm on a 48-thread CPU server. 
    more » « less
  4. Label Propagation is not only a well-known machine learning algorithm for classification, but it is also an effective method for discovering communities and connected components in networks. We propose a new Direction-Optimizing Label Propagation Algorithm (DOLPA) framework that enhances the performance of the standard Label Propagation Algorithm (LPA), increases its scalability, and extends its versatility and application scope. As a central feature, the DOLPA framework relies on the use of frontiers and alternates between label push and label pull operations to attain high performance. It is formulated in such a way that the same basic algorithm can be used for finding communities or connected components in graphs by only changing the objective function used. Additionally, DOLPA has parameters for tuning the processing order of vertices in a graph to reduce the number of edges visited and improve the quality of solution obtained. We present the design and implementation of the enhanced algorithm as well as our shared-memory parallelization of it using OpenMP. We also present an extensive experimental evaluation of our implementations using the LFR benchmark and real-world networks drawn from various domains. Compared with an implementation of LPA for community detection available in a widely used network analysis software, we achieve at most five times the F-Score while maintaining similar runtime for graphs with overlapping communities. We also compare DOLPA against an implementation of the Louvain method for community detection using the same LFR-graphs and show that DOLPA achieves about three times the F-Score at just 10% of the runtime. For connected component decomposition, our algorithm achieves orders of magnitude speedups over the basic LP-based algorithm on large diameter graphs, up to 13.2 × speedup over the Shiloach-Vishkin algorithm, and up to 1.6 × speedup over Afforest on an Intel Xeon processor using 40 threads. 
    more » « less
  5. In graph analytics, a truss is a cohesive subgraph based on the number of triangles supporting each edge. It is widely used for community detection applications such as social networks and security analysis, and the performance of truss analytics highly depends on its triangle counting method. This paper proposes a novel triangle counting kernel named Minimum Search (MS). Minimum Search can select two smaller adjacency lists out of three and uses fine-grained parallelism to improve the performance of triangle counting. Then, two basic algorithms, MS-based triangle counting, and MS-based support updating are developed. Based on the novel triangle counting kernel and the two basic algorithms above, three fundamental parallel truss analytics algorithms are designed and implemented to enable different kinds of graph truss analysis. These truss algorithms include an optimized K-Truss algorithm, a Max-Truss algorithm, and a Truss Decomposition algorithm. Moreover, all proposed algorithms have been implemented in the parallel language Chapel and integrated into an open-source framework, Arkouda. Through Arkouda, data scientists can efficiently conduct graph analysis through an easy-to-use Python interface and handle large-scale graph data in powerful back-end computing resources. Experimental results show that the proposed methods can significantly improve the performance of truss analysis on real-world graphs compared with the existing and widely adopted list intersection-based method. The implemented code is publicly available from GitHub (https://github.com/Bears-R-Us/arkoudanjit). } 
    more » « less