This work considers the general task of estimating the sum of a bounded function over the edges of a graph, given neighborhood query access and where access to the entire network is prohibitively expensive. To estimate this sum, prior work proposes Markov chain Monte Carlo (MCMC) methods that use random walks started at some seed vertex and whose equilibrium distribution is the uniform distribution over all edges, eliminating the need to iterate over all edges. Unfortunately, these existing estimators are not scalable to massive realworld graphs. In this paper, we introduce Ripple, an MCMCbased estimator that achieves unprecedented scalability by stratifying the Markov chain state space into ordered strata with a new technique that we denote sequential stratified regenerations. We show that the Ripple estimator is consistent, highly parallelizable, and scales well. We empirically evaluate our method by applying Ripple to the task of estimating connected, induced subgraph counts given some input graph. Therein, we demonstrate that Ripple is accurate and can estimate counts of up to 12node subgraphs, which is a task at a scale that has been considered unreachable, not only by prior MCMCbased methods but also by other sampling approaches. For instance, in this target application, we present results in which the Markov chain state space is as large as 1043, for which Ripple computes estimates in less than 4 h, on average.
more »
« less
NonMarkovian Monte Carlo on Directed Graphs
Markov Chain Monte Carlo (MCMC) has been the de facto technique
for sampling and inference of large graphs such as online
social networks. At the heart of MCMC lies the ability to construct
an ergodic Markov chain that attains any given stationary distribution
\pi, often in the form of random walks or crawling agents on
the graph. Most of the works around MCMC, however, presume
that the graph is undirected or has reciprocal edges, and become
inapplicable when the graph is directed and nonreciprocal. Here
we develop a similar framework for directed graphs called Non
Markovian Monte Carlo (NMMC) by establishing a mapping to
convert \pi into the quasistationary distribution of a carefully constructed
transient Markov chain on an extended state space. As applications,
we demonstrate how to achieve any given distribution
\pi on a directed graph and estimate the eigenvector centrality using
a set of nonMarkovian, historydependent random walks on the
same graph in a distributed manner.We also provide numerical results
on various realworld directed graphs to confirm our theoretical
findings, and present several practical enhancements to make
our NMMC method ready for practical use inmost directed graphs.
To the best of our knowledge, the proposed NMMC framework for
directed graphs is the first of its kind, unlocking all the limitations
set by the standard MCMC methods for undirected graphs.
more »
« less
 Award ID(s):
 1824518
 NSFPAR ID:
 10121293
 Date Published:
 Journal Name:
 ACM SIGMETRICS performance evaluation review
 ISSN:
 15579484
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Chordal graphs are a widely studied graph class, with applications in several areas of computer science, including structural learning of Bayesian networks. Many problems that are hard on general graphs become solvable on chordal graphs. The random generation of instances of chordal graphs for testing these algorithms is often required. Nevertheless, there are only few known algorithms that generate random chordal graphs, and, as far as we know, none of them generate chordal graphs uniformly at random (where each chordal graph appears with equal probability). In this paper we propose a Markov chain Monte Carlo (MCMC) method to sample connected chordal graphs uniformly at random. Additionally, we propose a Markov chain that generates connected chordal graphs with a bounded treewidth uniformly at random. Bounding the treewidth parameter (which bounds the largest clique) has direct implications on the running time of various algorithms on chordal graphs. For each of the proposed Markov chains we prove that they are ergodic and therefore converge to the uniform distribution. Finally, as initial evidence that the Markov chains have the potential to mix rapidly, we prove that the chain on graphs with bounded treewidth mixes rapidly for trees (chordal graphs with treewidth bound of one).more » « less

In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SVapproximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs [ST04], standard approximation for directed graphs [CKP + 17], and unitcircle (UC) approximation for directed graphs [AKM + 20]. Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e.g., it is preserved under products of randomwalk matrices and bounded matrices. We provide a nearly lineartime algorithm for SVsparsifying (and hence UCsparsifying) Eulerian directed graphs, as well as ℓstep random walks on such graphs, for any ℓ≤poly(n). Combined with the Eulerian scaling algorithms of [CKK + 18], given an arbitrary (not necessarily Eulerian) directed graph and a set S of vertices, we can approximate the stationary probability mass of the (S,Sc) cut in an ℓstep random walk to within a multiplicative error of 1/polylog(n) and an additive error of 1/poly(n) in nearly linear time. As a starting point for these results, we provide a simple blackbox reduction from SVsparsifying Eulerian directed graphs to SVsparsifying undirected graphs; such a directedtoundirected reduction was not known for previous notions of spectral approximation.more » « less

Bessani, Alysson ; Défago, Xavier ; Nakamura, Junya ; Wada, Koichi ; Yamauchi, Yukiko (Ed.)Markov Chain Monte Carlo (MCMC) algorithms are a widelyused algorithmic tool for sampling from highdimensional distributions, a notable example is the equilibirum distribution of graphical models. The Glauber dynamics, also known as the Gibbs sampler, is the simplest example of an MCMC algorithm; the transitions of the chain update the configuration at a randomly chosen coordinate at each step. Several works have studied distributed versions of the Glauber dynamics and we extend these efforts to a more general family of Markov chains. An important combinatorial problem in the study of MCMC algorithms is random colorings. Given a graph G of maximum degree Δ and an integer k ≥ Δ+1, the goal is to generate a random proper vertex kcoloring of G. Jerrum (1995) proved that the Glauber dynamics has O(nlog{n}) mixing time when k > 2Δ. Fischer and Ghaffari (2018), and independently Feng, Hayes, and Yin (2018), presented a parallel and distributed version of the Glauber dynamics which converges in O(log{n}) rounds for k > (2+ε)Δ for any ε > 0. We improve this result to k > (11/6δ)Δ for a fixed δ > 0. This matches the state of the art for randomly sampling colorings of general graphs in the sequential setting. Whereas previous works focused on distributed variants of the Glauber dynamics, our work presents a parallel and distributed version of the more general flip dynamics presented by Vigoda (2000) (and refined by Chen, Delcourt, Moitra, Perarnau, and Postle (2019)), which recolors local maximal twocolored components in each step.more » « less

Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intractable quantities, which limits their use for practitioners. We introduce Llag couplings to generate computable, nonasymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains. We apply Llag couplings to the tasks of (i) determining MCMC burnin, (ii) comparing different MCMC algorithms with the same target, and (iii) comparing exact and approximate MCMC. Lastly, we (iv) assess the bias of sequential Monte Carlo and selfnormalized importance samplers.more » « less