skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Finding and counting tree-like subgraphs using MapReduce
Several variants of the subgraph isomorphism problem, e.g., finding, counting and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of 107-108 edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.  more » « less
Award ID(s):
1633028 1405697
PAR ID:
10067733
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Multi-Scale Computing Systems
ISSN:
2332-7766
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We present the first polynomial-time algorithm to exactly compute the number of labeled chordal graphs on n vertices. Our algorithm solves a more general problem: given n and ω as input, it computes the number of ω-colorable labeled chordal graphs on n vertices, using O(n⁷) arithmetic operations. A standard sampling-to-counting reduction then yields a polynomial-time exact sampler that generates an ω-colorable labeled chordal graph on n vertices uniformly at random. Our counting algorithm improves upon the previous best result by Wormald (1985), which computes the number of labeled chordal graphs on n vertices in time exponential in n. An implementation of the polynomial-time counting algorithm gives the number of labeled chordal graphs on up to 30 vertices in less than three minutes on a standard desktop computer. Previously, the number of labeled chordal graphs was only known for graphs on up to 15 vertices. 
    more » « less
  2. Network alignment or graph matching—figuring out how vertices across different networks correspond to each other—is a key challenge in many fields, from protecting online privacy to mapping biological data, improving computer vision, and even understanding languages. However, this problem falls into the class of notoriously difficult quadratic assignment problems, which are NP-hard to solve or approximate. Despite these challenges, researchers Mao, Wu, Xu, and Yu have made a major breakthrough. In their paper, “Random Graph Matching at Otter’s Threshold via Counting Chandeliers,” they introduce an innovative algorithm that can successfully match two random networks whenever the square of their edge correlation exceeds Otter’s constant (≈0.338). Their key innovation lies in counting chandeliers—specially designed tree-like structures—to identify corresponding vertices across the networks. The algorithm correctly matches nearly all vertices with high probability and even achieves perfect matching whenever the data allows. This is the first-ever polynomial-time algorithm capable of achieving perfect and near-perfect matching with an explicit constant correlation for both dense and sparse networks, bridging a long-standing gap between statistical limits and algorithmic performance. 
    more » « less
  3. This paper introduces a novel, parallel, and scalable implementation of the VF2 algorithm for subgraph monomorphism developed in the high-productivity language Chapel. Efficient graph analysis in large and complex network datasets is crucial across numerous scientific domains. We address this need through our enhanced VF2 implementation, widely utilized in subgraph matching, and integrating it into Arachne—a Python-accessible, open-source, large-scale graph analysis framework. Leveraging the parallel computing capabilities of modern hardware architectures, our implementation achieves significant performance improvements. Benchmarks on synthetic and real-world datasets, including social, communication, and neuroscience networks, demonstrate speedups of up to 97X on 128 cores, compared to existing Python-based tools like NetworkX and DotMotif, which do not exploit parallelization. Our results on large-scale graphs demonstrate scalability and efficiency, establishing it as a viable tool for subgraph monomorphism, the backbone of numerous graph analytics such as motif counting and enumeration. Arachne, including our VF2 implementation, can be found on GitHub: https://github.com/Bears-R-Us/arkouda-njit. 
    more » « less
  4. The single-source shortest path (SSSP) problem is essential in graph theory with applications in navigation, biology, social networks, and traffic analysis. The  -Stepping algorithm enhances parallelism by grouping vertices into "buckets" based on their tentative distances. However, its performance depends on values and graph properties. This paper introduces an adaptive parallel Delta-Stepping implementation with three innovations: neighbor reordering, bucket fusion, and graph type-aware selection. Tested on 11 diverse graphs, it achieves an average 7.1× speedup over serial Dijkstra’s algorithm on a 48-thread CPU server. 
    more » « less
  5. Label Propagation is not only a well-known machine learning algorithm for classification, but it is also an effective method for discovering communities and connected components in networks. We propose a new Direction-Optimizing Label Propagation Algorithm (DOLPA) framework that enhances the performance of the standard Label Propagation Algorithm (LPA), increases its scalability, and extends its versatility and application scope. As a central feature, the DOLPA framework relies on the use of frontiers and alternates between label push and label pull operations to attain high performance. It is formulated in such a way that the same basic algorithm can be used for finding communities or connected components in graphs by only changing the objective function used. Additionally, DOLPA has parameters for tuning the processing order of vertices in a graph to reduce the number of edges visited and improve the quality of solution obtained. We present the design and implementation of the enhanced algorithm as well as our shared-memory parallelization of it using OpenMP. We also present an extensive experimental evaluation of our implementations using the LFR benchmark and real-world networks drawn from various domains. Compared with an implementation of LPA for community detection available in a widely used network analysis software, we achieve at most five times the F-Score while maintaining similar runtime for graphs with overlapping communities. We also compare DOLPA against an implementation of the Louvain method for community detection using the same LFR-graphs and show that DOLPA achieves about three times the F-Score at just 10% of the runtime. For connected component decomposition, our algorithm achieves orders of magnitude speedups over the basic LP-based algorithm on large diameter graphs, up to 13.2 × speedup over the Shiloach-Vishkin algorithm, and up to 1.6 × speedup over Afforest on an Intel Xeon processor using 40 threads. 
    more » « less