G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing
Authors:
; ; ; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10331912
Journal Name:
The VLDB journal
ISSN:
1066-8888
4. Quasi-cliques are a type of dense subgraphs that generalize the notion of cliques, important for applications such as community/module detection in various social and biological networks. However, the existing quasi-clique definition and algorithms are only applicable to undirected graphs. In this paper, we generalize the concept of quasi-cliques to directed graphs by proposing $(\gamma_1, \gamma_2)$-quasi-cliques which have density requirements in both inbound and outbound directions of each vertex in a quasi-clique subgraph. An efficient recursive algorithm is proposed to find maximal $(\gamma_1, \gamma_2)$-quasi-cliques which integrates many effective pruning rules that are validated by ablation studies. We also study the finding of top-$k$ large quasi-cliques directly by bootstrapping the search from more compact quasi-cliques, to scale the mining to larger networks. The algorithms are parallelized with effective load balancing, and we demonstrate that they can scale up effectively with the number of CPU cores.