G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph

Yan, Da; Guo, Guimu; Chowdhury, Md Mashiur; Özsu, Tamer; Ku, Wei-Shinn; Lui, John C.S.

Citation Details

Mining from a big graph those subgraphs that satisfy certain conditions is useful in many applications such as community detection and subgraph matching. These problems have a high time complexity, but existing systems to scale them are all IO-bound in execution. We propose the first truly CPU-bound distributed framework called G-thinker that adopts a user-friendly subgraph-centric vertex-pulling API for writing distributed subgraph mining algorithms. To utilize all CPU cores of a cluster, G-thinker features (1) a highly-concurrent vertex cache for parallel task access and (2) a lightweight task scheduling approach that ensures high task throughput. These designs well overlap communication with computation to minimize the CPU idle time. Extensive experiments demonstrate that G-thinker achieves orders of magnitude speedup compared even with the fastest existing subgraph-centric system, and it scales well to much larger and denser real network data. G-thinker is open-sourced at http://bit.ly/gthinker with detailed documentation. more »

Award ID(s):: 1755464

PAR ID:: 10140007

Author(s) / Creator(s):: Yan, Da; Guo, Guimu; Chowdhury, Md Mashiur; Özsu, Tamer; Ku, Wei-Shinn; Lui, John C.S.

Date Published:: 2020-01-01

Journal Name:: Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this