For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance. We first give an algorithm for local access to random walks on a given undirected d-regular graph with eO( 1 1−λ √ n) runtime per query, where λ is the second-largest eigenvalue of the random walk matrix of the graph in absolute value. Since random d-regular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input d-regular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime polylog(n) per query. This also allows for efficient local access to walks on polylog degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree nϵ graphs for any ϵ ∈ (0, 1]) and Cartesian product.
more »
« less
Simulating Random Walks in Random Streams
The random order graph streaming model has received significant attention recently, with problems such as matching size estimation, component counting, and the evaluation of bounded degree constant query testable properties shown to admit surprisingly space efficient algorithms. The main result of this paper is a space efficient single pass random order streaming algorithm for simulating nearly independent random walks that start at uniformly random vertices. We show that the distribution of k-step walks from b vertices chosen uniformly at random can be approximated up to error ∊ per walk using  words of space with a single pass over a randomly ordered stream of edges, solving an open problem of Peng and Sohler [SODA '18]. Applications of our result include the estimation of the average return probability of the k-step walk (the trace of the kth power of the random walk matrix) as well as the estimation of PageRank. We complement our algorithm with a strong impossibility result for directed graphs.
more »
« less
- Award ID(s):
- 1751040
- PAR ID:
- 10336438
- Date Published:
- Journal Name:
- Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms
- ISSN:
- 1071-9040
- Page Range / eLocation ID:
- 3091-3126
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We study an example of a hit-and-run random walk on the symmetric group $$\mathbf S_n$$ . Our starting point is the well-understood top-to-random shuffle. In the hit-and-run version, at each single step , after picking the point of insertion j uniformly at random in $$\{1,\ldots,n\}$$ , the top card is inserted in the j th position k times in a row, where k is uniform in $$\{0,1,\ldots,j-1\}$$ . The question is, does this accelerate mixing significantly or not? We show that, in $L^2$ and sup-norm, this accelerates mixing at most by a constant factor (independent of n ). Analyzing this problem in total variation is an interesting open question. We show that, in general, hit-and-run random walks on finite groups have non-negative spectrum.more » « less
-
Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)The maximum coverage problem is to select k sets, from a collection of m sets, such that the cardinality of their union, in a universe of size n, is maximized. We consider (1-1/e-ε)-approximation algorithms for this NP-hard problem in three standard data stream models. 1) Dynamic Model. The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses ε^{-2} k ⋅ polylog(n,m) space. The best previous result (Assadi and Khanna, SODA 2018) used (n +ε^{-4} k) polylog(n,m) space. While both algorithms use O(ε^{-1} log m) passes, our analysis shows that, when ε ≤ 1/log log m, it is possible to reduce the number of passes by a 1/log log m factor without incurring additional space. 2) Random Order Model. In this model, there are no deletions, and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and k polylog(n,m) space suffices for arbitrary small constant ε. The best previous result, by Warneke et al. (ESA 2023), used k² polylog(n,m) space. 3) Insert-Only Model. Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: polylog(m,n) update time suffices, whereas the best previous result by Jaud et al. (SEA 2023) required update time that was linear in k.more » « less
-
Finding the reduced-dimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large lumpable networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm (GHA). The algorithm is able to process random walk data in a streaming fashion and learn a low-dimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuous-time limiting process of the stochastic algorithm converges globally to the “principal components” of the Markov chain. We also establish a finite-sample error bound that matches the nonimprovable state-of-art result for online factorization. Once learned the low-dimensional state representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability given sufficient data. We apply the proposed approach to model the traffic flow of Manhattan as city-wide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics.more » « less
-
Guruswami, Venkatesan (Ed.)Traditional fraud detection is often based on finding statistical anomalies in data sets and transaction histories. A sophisticated fraudster, aware of the exact kinds of tests being deployed, might be difficult or impossible to catch. We are interested in paradigms for fraud detection that are provably robust against any adversary, no matter how sophisticated. In other words, the detection strategy should rely on signals in the data that are inherent in the goals the adversary is trying to achieve. Specifically, we consider a fraud detection game centered on a random walk on a graph. We assume this random walk is implemented by having a player at each vertex, who can be honest or not. In particular, when the random walk reaches a vertex owned by an honest player, it proceeds to a uniformly random neighbor at the next timestep. However, when the random walk reaches a dishonest player, it instead proceeds to an arbitrary neighbor chosen by an omniscient Adversary. The game is played between the Adversary and a Referee who sees the trajectory of the random walk. At any point during the random walk, if the Referee determines that a {specific} vertex is controlled by a dishonest player, the Referee accuses that player, and therefore wins the game. The Referee is allowed to make the occasional incorrect accusation, but must follow a policy that makes such mistakes with small probability of error. The goal of the adversary is to make the cover time large, ideally infinite, i.e., the walk should never reach at least one vertex. We consider the following basic question: how much can the omniscient Adversary delay the cover time without getting caught? Our main result is a tight upper bound on this delay factor. We also discuss possible applications of our results to settings such as Rotor Walks, Leader Election, and Sybil Defense.more » « less