The quality of network clustering is often measured in terms of a commonly used metric known as “modularity”. Modularity compares the clusters found in a network to those present in a random graph (a “null model”). Unfortunately, modularity is somewhat ill suited for studying spatially embedded networks, since a random graph contains no basic geometrical notions. Regardless of their distance, the null model assigns a nonzero probability for an edge to appear between any pair of nodes. Here, we propose a variant of modularity that does not rely on the use of a null model. To demonstrate the essentials of our method, we analyze networks generated from granular ensemble. We show that our method performs better than the most commonly used Newman-Girvan (NG) modularity in detecting the best (physically transparent) partitions in those systems. Our measure further properly detects hierarchical structures, whenever these are present.
Network comparison and the within-ensemble graph distance
Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years, a multitude of diverse, ad hoc solutions to this problem have been introduced. Here, we propose that simple and well-understood ensembles of random networks—such as Erdős–Rényi graphs, random geometric graphs, Watts–Strogatz graphs, the configuration model and preferential attachment networks—are natural benchmarks for network comparison methods. Moreover, we show that the expected distance between two networks independently sampled from a generative model is a useful property that encapsulates many key features of that model. To illustrate our results, we calculate this within-ensemble graph distance and related quantities for classic network models (and several parameterizations thereof) using 20 distance measures commonly used to compare graphs. The within-ensemble graph distance provides a new framework for developers of graph distances to better understand their creations and for practitioners to better choose an appropriate tool for their particular task.
more »
« less
- Award ID(s):
- 1741355
- PAR ID:
- 10463786
- Date Published:
- Journal Name:
- Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
- Volume:
- 476
- Issue:
- 2243
- ISSN:
- 1364-5021
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Bach, Francis (Ed.)A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph into a large network. We propose two complementary MCMC algorithms for sampling random graph homomorphisms and establish bounds on their mixing times and the concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neighborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut dis- tance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also demonstrate the performance of our frame- work on the tasks of network clustering and subgraph classification on the Facebook100 dataset and on Word Adjacency Networks of a set of classic novels.more » « less
-
Francis Bach (Ed.)A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph into a large network. We propose two complementary MCMC algorithms for sampling random graph homomorphisms and establish bounds on their mixing times and the concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neighborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut dis- tance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also demonstrate the performance of our frame- work on the tasks of network clustering and subgraph classification on the Facebook100 dataset and on Word Adjacency Networks of a set of classic novels.more » « less
-
Benito, Rosa Maria ; Cherifi, Chantal ; Cherifi, Hocine ; Moro, Esteban ; Rocha, Luis M. (Ed.)To characterize the “average” of a set of graphs, one can compute the sample Fr ́echet mean. We prove the following result: if we use the Hamming distance to compute distances between graphs, then the Fr ́echet mean of an ensemble of inhomogeneous random graphs is obtained by thresholding the expected adjacency matrix: an edge exists between the vertices i and j in the Fr ́echet mean graph if and only if the corresponding entry of the expected adjacency matrix is greater than 1/2. We prove that the result also holds for the sample Fr ́echet mean when the expected adjacency matrix is replaced with the sample mean adjacency matrix. This novel theoretical result has some significant practical consequences; for instance, the Fr ́echet mean of an ensemble of sparse inhomogeneous random graphs is the empty graph.more » « less
-
Inferring graph structure from observations on the nodes is an important and popular network science task. Departing from the more common inference of a single graph, we study the problem of jointly inferring multiple graphs from the observation of signals at their nodes (graph signals), which are assumed to be stationary in the sought graphs. Graph stationarity implies that the mapping between the covariance of the signals and the sparse matrix representing the underlying graph is given by a matrix polynomial. A prominent example is that of Markov random fields, where the inverse of the covariance yields the sparse matrix of interest. From a modeling perspective, stationary graph signals can be used to model linear network processes evolving on a set of (not necessarily known) networks. Leveraging that matrix polynomials commute, a convex optimization method along with sufficient conditions that guarantee the recovery of the true graphs are provided when perfect covariance information is available. Particularly important from an empirical viewpoint, we provide high-probability bounds on the recovery error as a function of the number of signals observed and other key problem parameters. Numerical experiments demonstrate the effectiveness of the proposed method with perfect covariance information as well as its robustness in the noisy regime.more » « less