skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sampling random graph homomorphisms and applications to network data analysis
A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph into a large network. We propose two complementary MCMC algorithms for sampling random graph homomorphisms and establish bounds on their mixing times and the concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neighborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut dis- tance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also demonstrate the performance of our frame- work on the tasks of network clustering and subgraph classification on the Facebook100 dataset and on Word Adjacency Networks of a set of classic novels.  more » « less
Award ID(s):
1740761
PAR ID:
10397669
Author(s) / Creator(s):
; ;
Editor(s):
Bach, Francis
Date Published:
Journal Name:
Journal of machine learning research
Volume:
24
Issue:
9
ISSN:
1532-4435
Page Range / eLocation ID:
1-79
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Francis Bach (Ed.)
    A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph into a large network. We propose two complementary MCMC algorithms for sampling random graph homomorphisms and establish bounds on their mixing times and the concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neighborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut dis- tance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also demonstrate the performance of our frame- work on the tasks of network clustering and subgraph classification on the Facebook100 dataset and on Word Adjacency Networks of a set of classic novels. 
    more » « less
  2. Markov Chain Monte Carlo (MCMC) has been the de facto technique for sampling and inference of large graphs such as online social networks. At the heart of MCMC lies the ability to construct an ergodic Markov chain that attains any given stationary distribution \pi, often in the form of random walks or crawling agents on the graph. Most of the works around MCMC, however, presume that the graph is undirected or has reciprocal edges, and become inapplicable when the graph is directed and non-reciprocal. Here we develop a similar framework for directed graphs called Non- Markovian Monte Carlo (NMMC) by establishing a mapping to convert \pi into the quasi-stationary distribution of a carefully constructed transient Markov chain on an extended state space. As applications, we demonstrate how to achieve any given distribution \pi on a directed graph and estimate the eigenvector centrality using a set of non-Markovian, history-dependent random walks on the same graph in a distributed manner.We also provide numerical results on various real-world directed graphs to confirm our theoretical findings, and present several practical enhancements to make our NMMC method ready for practical use inmost directed graphs. To the best of our knowledge, the proposed NMMC framework for directed graphs is the first of its kind, unlocking all the limitations set by the standard MCMC methods for undirected graphs. 
    more » « less
  3. Chordal graphs are a widely studied graph class, with applications in several areas of computer science, including structural learning of Bayesian networks. Many problems that are hard on general graphs become solvable on chordal graphs. The random generation of instances of chordal graphs for testing these algorithms is often required. Nevertheless, there are only few known algorithms that generate random chordal graphs, and, as far as we know, none of them generate chordal graphs uniformly at random (where each chordal graph appears with equal probability). In this paper we propose a Markov chain Monte Carlo (MCMC) method to sample connected chordal graphs uniformly at random. Additionally, we propose a Markov chain that generates connected chordal graphs with a bounded treewidth uniformly at random. Bounding the treewidth parameter (which bounds the largest clique) has direct implications on the running time of various algorithms on chordal graphs. For each of the proposed Markov chains we prove that they are ergodic and therefore converge to the uniform distribution. Finally, as initial evidence that the Markov chains have the potential to mix rapidly, we prove that the chain on graphs with bounded treewidth mixes rapidly for trees (chordal graphs with treewidth bound of one). 
    more » « less
  4. Vidick, Thomas (Ed.)
    Graph homomorphism has been an important research topic since its introduction [14]. Stated in the language of binary relational structures in that paper [14], Lovász proved a fundamental theorem that the graph homomorphism function G 7→ hom(G, H) for 0-1 valued H (as the adjacency matrix of a graph) determines the isomorphism type of H. In the past 50 years various extensions have been proved by Lovász and others [15, 9, 1, 19, 17]. These extend the basic 0-1 case to admit vertex and edge weights; but always with some restrictions such as all vertex weights must be positive. In this paper we prove a general form of this theorem where H can have arbitrary vertex and edge weights. An innovative aspect is that we prove this by a surprisingly simple and unified argument. This bypasses various technical obstacles and unifies and extends all previous known versions of this theorem on graphs. The constructive proof of our theorem can be used to make various complexity dichotomy theorems for graph homomorphism effective, i.e., it provides an algorithm that for any H either outputs a P-time algorithm solving hom(·, H) or a P-time reduction from a canonical #P-hard problem to hom(·, H). 
    more » « less
  5. Mestre, Julián; Wirth, Anthony (Ed.)
    Counting the number of homomorphisms of a pattern graph H in a large input graph G is a fundamental problem in computer science. In many applications in databases, bioinformatics, and network science, we need more than just the total count. We wish to compute, for each vertex v of G, the number of H-homomorphisms that v participates in. This problem is referred to as homomorphism orbit counting, as it relates to the orbits of vertices of H under its automorphisms. Given the need for fast algorithms for this problem, we study when near-linear time algorithms are possible. A natural restriction is to assume that the input graph G has bounded degeneracy, a commonly observed property in modern massive networks. Can we characterize the patterns H for which homomorphism orbit counting can be done in near-linear time? We discover a dichotomy theorem that resolves this problem. For pattern H, let 𝓁 be the length of the longest induced path between any two vertices of the same orbit (under the automorphisms of H). If 𝓁 ≤ 5, then H-homomorphism orbit counting can be done in near-linear time for bounded degeneracy graphs. If 𝓁 > 5, then (assuming fine-grained complexity conjectures) there is no near-linear time algorithm for this problem. We build on existing work on dichotomy theorems for counting the total H-homomorphism count. Surprisingly, there exist (and we characterize) patterns H for which the total homomorphism count can be computed in near-linear time, but the corresponding orbit counting problem cannot be done in near-linear time. 
    more » « less