For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance.
We first give an algorithm for local access to random walks on a given undirected dregular graph with eO( 1 1−λ √ n) runtime per query, where λ is the secondlargest eigenvalue of the random walk matrix of the graph in absolute value. Since random dregular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input dregular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on smalldegree abelian Cayley graphs, including cycles and hypercubes, with runtime polylog(n) per query. This also allows for efficient
local access to walks on polylog degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree nϵ graphs for any ϵ ∈ (0, 1]) and Cartesian product.
more »
« less
Simulating Random Walks in Random Streams
The random order graph streaming model has received significant attention recently, with problems such as matching size estimation, component counting, and the evaluation of bounded degree constant query testable properties shown to admit surprisingly space efficient algorithms.
The main result of this paper is a space efficient single pass random order streaming algorithm for simulating nearly independent random walks that start at uniformly random vertices. We show that the distribution of kstep walks from b vertices chosen uniformly at random can be approximated up to error ∊ per walk using ￼ words of space with a single pass over a randomly ordered stream of edges, solving an open problem of Peng and Sohler [SODA '18]. Applications of our result include the estimation of the average return probability of the kstep walk (the trace of the kth power of the random walk matrix) as well as the estimation of PageRank. We complement our algorithm with a strong impossibility result for directed graphs.
more »
« less
 Award ID(s):
 1751040
 NSFPAR ID:
 10336438
 Date Published:
 Journal Name:
 Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms
 ISSN:
 10719040
 Page Range / eLocation ID:
 30913126
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Abstract We study an example of a hitandrun random walk on the symmetric group $\mathbf S_n$ . Our starting point is the wellunderstood toptorandom shuffle. In the hitandrun version, at each single step , after picking the point of insertion j uniformly at random in $\{1,\ldots,n\}$ , the top card is inserted in the j th position k times in a row, where k is uniform in $\{0,1,\ldots,j1\}$ . The question is, does this accelerate mixing significantly or not? We show that, in $L^2$ and supnorm, this accelerates mixing at most by a constant factor (independent of n ). Analyzing this problem in total variation is an interesting open question. We show that, in general, hitandrun random walks on finite groups have nonnegative spectrum.more » « less

Finding the reduceddimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large lumpable networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm (GHA). The algorithm is able to process random walk data in a streaming fashion and learn a lowdimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuoustime limiting process of the stochastic algorithm converges globally to the “principal components” of the Markov chain. We also establish a finitesample error bound that matches the nonimprovable stateofart result for online factorization. Once learned the lowdimensional state representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability given sufficient data. We apply the proposed approach to model the traffic flow of Manhattan as citywide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics.more » « less

Guruswami, Venkatesan (Ed.)Traditional fraud detection is often based on finding statistical anomalies in data sets and transaction histories. A sophisticated fraudster, aware of the exact kinds of tests being deployed, might be difficult or impossible to catch. We are interested in paradigms for fraud detection that are provably robust against any adversary, no matter how sophisticated. In other words, the detection strategy should rely on signals in the data that are inherent in the goals the adversary is trying to achieve. Specifically, we consider a fraud detection game centered on a random walk on a graph. We assume this random walk is implemented by having a player at each vertex, who can be honest or not. In particular, when the random walk reaches a vertex owned by an honest player, it proceeds to a uniformly random neighbor at the next timestep. However, when the random walk reaches a dishonest player, it instead proceeds to an arbitrary neighbor chosen by an omniscient Adversary. The game is played between the Adversary and a Referee who sees the trajectory of the random walk. At any point during the random walk, if the Referee determines that a {specific} vertex is controlled by a dishonest player, the Referee accuses that player, and therefore wins the game. The Referee is allowed to make the occasional incorrect accusation, but must follow a policy that makes such mistakes with small probability of error. The goal of the adversary is to make the cover time large, ideally infinite, i.e., the walk should never reach at least one vertex. We consider the following basic question: how much can the omniscient Adversary delay the cover time without getting caught? Our main result is a tight upper bound on this delay factor. We also discuss possible applications of our results to settings such as Rotor Walks, Leader Election, and Sybil Defense.more » « less

We propose datadriven onepass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, Hsu et al. (2019a) and Jiang et al. (2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior “classical” algorithms that did not use oracles. In this paper, we explore the power of a “heavy edge” oracle in multiple graph edge streaming models. In the adjacency list model, we present a onepass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on “classical” streaming algorithms, as previous multipass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to stateoftheart streaming algorithms.more » « less