skip to main content

Title: Non-Markovian Monte Carlo on Directed Graphs
Markov Chain Monte Carlo (MCMC) has been the de facto technique for sampling and inference of large graphs such as online social networks. At the heart of MCMC lies the ability to construct an ergodic Markov chain that attains any given stationary distribution \pi, often in the form of random walks or crawling agents on the graph. Most of the works around MCMC, however, presume that the graph is undirected or has reciprocal edges, and become inapplicable when the graph is directed and non-reciprocal. Here we develop a similar framework for directed graphs called Non- Markovian Monte Carlo (NMMC) by establishing a mapping to convert \pi into the quasi-stationary distribution of a carefully constructed transient Markov chain on an extended state space. As applications, we demonstrate how to achieve any given distribution \pi on a directed graph and estimate the eigenvector centrality using a set of non-Markovian, history-dependent random walks on the same graph in a distributed manner.We also provide numerical results on various real-world directed graphs to confirm our theoretical findings, and present several practical enhancements to make our NMMC method ready for practical use inmost directed graphs. To the best of our knowledge, the proposed NMMC framework for directed graphs is the first of its kind, unlocking all the limitations set by the standard MCMC methods for undirected graphs.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM SIGMETRICS performance evaluation review
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work considers the general task of estimating the sum of a bounded function over the edges of a graph, given neighborhood query access and where access to the entire network is prohibitively expensive. To estimate this sum, prior work proposes Markov chain Monte Carlo (MCMC) methods that use random walks started at some seed vertex and whose equilibrium distribution is the uniform distribution over all edges, eliminating the need to iterate over all edges. Unfortunately, these existing estimators are not scalable to massive real-world graphs. In this paper, we introduce Ripple, an MCMC-based estimator that achieves unprecedented scalability by stratifying the Markov chain state space into ordered strata with a new technique that we denote sequential stratified regenerations. We show that the Ripple estimator is consistent, highly parallelizable, and scales well. We empirically evaluate our method by applying Ripple to the task of estimating connected, induced subgraph counts given some input graph. Therein, we demonstrate that Ripple is accurate and can estimate counts of up to 12-node subgraphs, which is a task at a scale that has been considered unreachable, not only by prior MCMC-based methods but also by other sampling approaches. For instance, in this target application, we present results in which the Markov chain state space is as large as 1043, for which Ripple computes estimates in less than 4 h, on average. 
    more » « less
  2. Chordal graphs are a widely studied graph class, with applications in several areas of computer science, including structural learning of Bayesian networks. Many problems that are hard on general graphs become solvable on chordal graphs. The random generation of instances of chordal graphs for testing these algorithms is often required. Nevertheless, there are only few known algorithms that generate random chordal graphs, and, as far as we know, none of them generate chordal graphs uniformly at random (where each chordal graph appears with equal probability). In this paper we propose a Markov chain Monte Carlo (MCMC) method to sample connected chordal graphs uniformly at random. Additionally, we propose a Markov chain that generates connected chordal graphs with a bounded treewidth uniformly at random. Bounding the treewidth parameter (which bounds the largest clique) has direct implications on the running time of various algorithms on chordal graphs. For each of the proposed Markov chains we prove that they are ergodic and therefore converge to the uniform distribution. Finally, as initial evidence that the Markov chains have the potential to mix rapidly, we prove that the chain on graphs with bounded treewidth mixes rapidly for trees (chordal graphs with treewidth bound of one). 
    more » « less
  3. In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SV-approximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs [ST04], standard approximation for directed graphs [CKP + 17], and unit-circle (UC) approximation for directed graphs [AKM + 20]. Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e.g., it is preserved under products of randomwalk matrices and bounded matrices. We provide a nearly linear-time algorithm for SV-sparsifying (and hence UC-sparsifying) Eulerian directed graphs, as well as ℓ-step random walks on such graphs, for any ℓ≤poly(n). Combined with the Eulerian scaling algorithms of [CKK + 18], given an arbitrary (not necessarily Eulerian) directed graph and a set S of vertices, we can approximate the stationary probability mass of the (S,Sc) cut in an ℓ-step random walk to within a multiplicative error of 1/polylog(n) and an additive error of 1/poly(n) in nearly linear time. As a starting point for these results, we provide a simple black-box reduction from SV-sparsifying Eulerian directed graphs to SV-sparsifying undirected graphs; such a directed-to-undirected reduction was not known for previous notions of spectral approximation. 
    more » « less
  4. Bessani, Alysson ; Défago, Xavier ; Nakamura, Junya ; Wada, Koichi ; Yamauchi, Yukiko (Ed.)
    Markov Chain Monte Carlo (MCMC) algorithms are a widely-used algorithmic tool for sampling from high-dimensional distributions, a notable example is the equilibirum distribution of graphical models. The Glauber dynamics, also known as the Gibbs sampler, is the simplest example of an MCMC algorithm; the transitions of the chain update the configuration at a randomly chosen coordinate at each step. Several works have studied distributed versions of the Glauber dynamics and we extend these efforts to a more general family of Markov chains. An important combinatorial problem in the study of MCMC algorithms is random colorings. Given a graph G of maximum degree Δ and an integer k ≥ Δ+1, the goal is to generate a random proper vertex k-coloring of G. Jerrum (1995) proved that the Glauber dynamics has O(nlog{n}) mixing time when k > 2Δ. Fischer and Ghaffari (2018), and independently Feng, Hayes, and Yin (2018), presented a parallel and distributed version of the Glauber dynamics which converges in O(log{n}) rounds for k > (2+ε)Δ for any ε > 0. We improve this result to k > (11/6-δ)Δ for a fixed δ > 0. This matches the state of the art for randomly sampling colorings of general graphs in the sequential setting. Whereas previous works focused on distributed variants of the Glauber dynamics, our work presents a parallel and distributed version of the more general flip dynamics presented by Vigoda (2000) (and refined by Chen, Delcourt, Moitra, Perarnau, and Postle (2019)), which recolors local maximal two-colored components in each step. 
    more » « less
  5. Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intractable quantities, which limits their use for practitioners. We introduce L-lag couplings to generate computable, non-asymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains. We apply L-lag couplings to the tasks of (i) determining MCMC burn-in, (ii) comparing different MCMC algorithms with the same target, and (iii) comparing exact and approximate MCMC. Lastly, we (iv) assess the bias of sequential Monte Carlo and self-normalized importance samplers. 
    more » « less