Hypothesis Testing for Automated Community Detection in Networks

Bickel, Peter J.; Sarkar, Purnamrita

doi:10.1111/rssb.12117

Citation Details

Hypothesis Testing for Automated Community Detection in Networks

Summary

Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdős–Rényi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.

NSF-PAR ID:: 10397454

Author(s) / Creator(s):: Bickel, Peter J.; Sarkar, Purnamrita

Publisher / Repository:: Oxford University Press

Date Published:: 2015-05-21

Journal Name:: Journal of the Royal Statistical Society Series B: Statistical Methodology

Volume:: 78

Issue:: 1

ISSN:: 1369-7412

Page Range / eLocation ID:: p. 253-273

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1111/rssb.12117

More Like this