skip to main content


Title: Frequent Subgraph Mining Algorithms in Static and Temporal Graph-Transaction Settings: A Survey
Networks are known as perfect tools for modeling various types of systems. In the literature of network mining, frequent subgraph mining is considered as the essence of mining network data. In this problem, the dataset is composed of networks representing multiple independent systems or one system at multiple time stamps. The cores of mining frequent subgraphs are graph and subgraph isomorphism. Due to the complexities of these problems, the frequent subgraph mining algorithms proposed in the literature employ various heuristics for candidate generation, duplicate subgraphs pruning, and support computation. In this survey, we provide a classification of proposed algorithms in the literature. The algorithms for static networks have found numerous applications. Therefore, these algorithms will be reviewed in detail. Besides, it is discussed that consideration of temporality of data can impact the derived insight and attracted substantial attention in recent years. However, prior surveys have not comprehensively examined the algorithms of frequent subgraph mining in a database of temporal networks represented as network snapshots. Therefore, the algorithms proposed for mining frequent subgraphs in temporal networks are reviewed. Moreover, most of the surveys have focused on main-memory algorithms. Here, we review disk-based, parallel, and distributed algorithms proposed for mining frequent subgraphs.  more » « less
Award ID(s):
1741306
NSF-PAR ID:
10484247
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Big Data
ISSN:
2372-2096
Subject(s) / Keyword(s):
["Data Mining, Distributed Algorithms, Graph Theory, Duplicate Subgraphs, Frequent Subgraph Mining Algorithms, Mining Frequent Subgraphs, Mining Network Data, Multiple Independent Systems, Network Mining, Network Snapshots, Static Networks, Subgraph Isomorphism, Temporal Graph Transaction Settings, Temporal Networks, Data Mining, Heuristic Algorithms, Databases, Big Data, Vegetation, Search Problems, Topology, Subgraph Mining, Network Mining, Temporal Networks, Static Networks"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Holme, Peter (Ed.)
    Abstract Motifs are the fundamental components of complex systems. The topological structure of networks representing complex systems and the frequency and distribution of motifs in these networks are intertwined. The complexities associated with graph and subgraph isomorphism problems, as the core of frequent subgraph mining, directly impact the performance of motif discovery algorithms. Researchers have adopted different strategies for candidate generation and enumeration and frequency computation to cope with these complexities. Besides, in the past few years, there has been an increasing interest in the analysis and mining of temporal networks. In contrast to their static counterparts, these networks change over time in the form of insertion, deletion or substitution of edges or vertices or their attributes. In this article, we provide a survey of motif discovery algorithms proposed in the literature for mining static and temporal networks and review the corresponding algorithms based on their adopted strategies for candidate generation and frequency computation. As we witness the generation of a large amount of network data in social media platforms, bioinformatics applications and communication and transportation networks and the advance in distributed computing and big data technology, we also conduct a survey on the algorithms proposed to resolve the CPU-bound and I/O bound problems in mining static and temporal networks. 
    more » « less
  2. Abstract Background Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. Results We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. Conclusion Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies. 
    more » « less
  3. Large amount of gene expression data has been collected for various environmental and biological conditions. Extracting co-expression networks that are recurrent in multiple co-expression networks has been shown promising in functional gene annotation and biomarkers discovery. Frequent subgraph mining reports a large number of subnetworks. In this work, we propose to mine approximate dense frequent subgraphs. Our proposed approach reports representative frequent subgraphs that are also dense. Our experiments on real gene coexpression networks show that frequent subgraphs are biologically interesting as evidenced by the large percentage of biologically enriched frequent dense subgraphs.

     
    more » « less
  4. Network embedding has become the cornerstone of a variety of mining tasks, such as classification, link prediction, clustering, anomaly detection and many more, thanks to its superior ability to encode the intrinsic network characteristics in a compact low-dimensional space. Most of the existing methods focus on a single network and/or a single resolution, which generate embeddings of different network objects (node/subgraph/network) from different networks separately. A fundamental limitation with such methods is that the intrinsic relationship across different networks (e.g., two networks share same or similar subgraphs) and that across different resolutions (e.g., the node-subgraph membership) are ignored, resulting in disparate embeddings. Consequentially, it leads to sub-optimal performance or even becomes inapplicable for some downstream mining tasks (e.g., role classification, network alignment. etc.). In this paper, we propose a unified framework MrMine to learn the representations of objects from multiple networks at three complementary resolutions (i.e., network, subgraph and node) simultaneously. The key idea is to construct the cross-resolution cross-network context for each object. The proposed method bears two distinctive features. First, it enables and/or boosts various multi-network downstream mining tasks by having embeddings at different resolutions from different networks in the same embedding space. Second, Our method is efficient and scalable, with a O(nlog(n)) time complexity for the base algorithm and a linear time complexity w.r.t. the number of nodes and edges of input networks for the accelerated version. Extensive experiments on real-world data show that our methods (1) are able to enable and enhance a variety of multi-network mining tasks, and (2) scale up to million-node networks. 
    more » « less
  5. Quasi-cliques are a type of dense subgraphs that generalize the notion of cliques, important for applications such as community/module detection in various social and biological networks. However, the existing quasi-clique definition and algorithms are only applicable to undirected graphs. In this paper, we generalize the concept of quasi-cliques to directed graphs by proposing $(\gamma_1, \gamma_2)$-quasi-cliques which have density requirements in both inbound and outbound directions of each vertex in a quasi-clique subgraph. An efficient recursive algorithm is proposed to find maximal $(\gamma_1, \gamma_2)$-quasi-cliques which integrates many effective pruning rules that are validated by ablation studies. We also study the finding of top-$k$ large quasi-cliques directly by bootstrapping the search from more compact quasi-cliques, to scale the mining to larger networks. The algorithms are parallelized with effective load balancing, and we demonstrate that they can scale up effectively with the number of CPU cores. 
    more » « less