skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local Access to Huge Random Objects Through Partial Sampling
Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "on-the-fly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree Δ. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n,p) model). The distribution over valid colorings is instead specified via a "huge" input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≥ 9Δ, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself.  more » « less
Award ID(s):
1741137
PAR ID:
10220504
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
11th Innovations in Theoretical Computer Science Conference, ITCS 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Consider an algorithm performing a computation on a huge random object. Is it necessary to generate the entire object up front, or is it possible to provide query access to the object and sample it incrementally "on-the-fly"? Such an implementation should emulate the object by answering queries in a manner consistent with a random instance sampled from the true distribution. Our first set of results focus on undirected graphs with independent edge probabilities, under certain assumptions. Then, we use this to obtain the first efficient implementations for the Erdos-Renyi model and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. We also introduce a new Random-Neighbor query. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (always positive random walks on the integer line). Here, we support Height queries to find the position of the walk, and First-Return queries to find the time when the walk returns to a specified height. This in turn can be used to implement Next-Neighbor queries on random rooted/binary trees, and Matching-Bracket queries on random well bracketed expressions. Finally, we define a new model that: (1) allows multiple independent simultaneous instantiations of the same implementation to be consistent with each other without communication (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with max degree Δ. The distribution over valid colorings is specified via a "huge" underlying graph G, that is far too large to be read in sub-linear time. Instead, we access G through local neighborhood probes. We are able to answer queries to the color of any vertex in sub-linear time for q>9Δ. 
    more » « less
  2. For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance. We first give an algorithm for local access to random walks on a given undirected d-regular graph with eO( 1 1−λ √ n) runtime per query, where λ is the second-largest eigenvalue of the random walk matrix of the graph in absolute value. Since random d-regular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input d-regular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime polylog(n) per query. This also allows for efficient local access to walks on polylog degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree nϵ graphs for any ϵ ∈ (0, 1]) and Cartesian product. 
    more » « less
  3. null (Ed.)
    The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczúr and Karger (1996) showed that given any n-vertex undirected weighted graph G and a parameter ε ∈ (0,1), there is a near-linear time algorithm that outputs a weighted subgraph G' of G of size Õ(n/ε²) such that the weight of every cut in G is preserved to within a (1 ± ε)-factor in G'. The graph G' is referred to as a (1 ± ε)-approximate cut sparsifier of G. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require Ω(n + m) time where n denotes the number of vertices and m denotes the number of hyperedges in the hypergraph. Since m can be exponentially large in n, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in n, independent of the number of edges. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph. Specifically, we design an algorithm that constructs a (1 ± ε)-approximate cut sparsifier of a hypergraph H(V,E) in polynomial time in n, independent of the number of hyperedges, when given access to the hypergraph using the following two queries: 1) given any cut (S, ̄S), return the size |δ_E(S)| (cut value queries); and 2) given any cut (S, ̄S), return a uniformly at random edge crossing the cut (cut edge sample queries). Our algorithm outputs a sparsifier with Õ(n/ε²) edges, which is essentially optimal. We then extend our results to show that cut value and cut edge sample queries can also be used to construct hypergraph spectral sparsifiers in poly(n) time, independent of the number of hyperedges. We complement the algorithmic results above by showing that any algorithm that has access to only one of the above two types of queries can not give a hypergraph cut sparsifier in time that is polynomial in n. Finally, we show that our algorithmic results also hold if we replace the cut edge sample queries with a pair neighbor sample query that for any pair of vertices, returns a random edge incident on them. In contrast, we show that having access only to cut value queries and queries that return a random edge incident on a given single vertex, is not sufficient. 
    more » « less
  4. Dyck-reachability is a fundamental formulation for program analysis, which has been widely used to capture properly-matched-parenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyck-reachability is a relaxation of Dyck-reachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyck-reachability formulation. Bidirected Dyck-reachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyck-reachability algorithm computes all-pairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyck-reachability. In particular, we consider the problem of maintaining all-pairs Dyck-reachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyck-reachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. All-pairs Dyck-reachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worst-case guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any all-pairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a context-sensitive data-dependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )-time optimal bidirected Dyck-reachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches. 
    more » « less
  5. We consider the question of orienting the edges in a graph G such that every vertex has bounded out-degree. For graphs of arboricity α, there is an orientation in which every vertex has out-degree at most α and, moreover, the best possible maximum out-degree of an orientation is at least α - 1. We are thus interested in algorithms that can achieve a maximum out-degree of close to α. A widely studied approach for this problem in the distributed algorithms setting is a "peeling algorithm" that provides an orientation with maximum out-degree α(2+ε) in a logarithmic number of iterations. We consider this problem in the local computation algorithm (LCA) model, which quickly answers queries of the form "What is the orientation of edge (u,v)?" by probing the input graph. When the peeling algorithm is executed in the LCA setting by applying standard techniques, e.g., the Parnas-Ron paradigm, it requires Ω(n) probes per query on an n-vertex graph. In the case where G has unbounded degree, we show that any LCA that orients its edges to yield maximum out-degree r must use Ω(√ n/r) probes to G per query in the worst case, even if G is known to be a forest (that is, α = 1). We also show several algorithms with sublinear probe complexity when G has unbounded degree. When G is a tree such that the maximum degree Δ of G is bounded, we demonstrate an algorithm that uses Δ n^{1-log_Δ r + o(1)} probes to G per query. To obtain this result, we develop an edge-coloring approach that ultimately yields a graph-shattering-like result. We also use this shattering-like approach to demonstrate an LCA which 4-colors any tree using sublinear probes per query. 
    more » « less