skip to main content

Search for: All records

Creators/Authors contains: "Yodpinyanee, Anak"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "on-the-fly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire objectmore »up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree Δ. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n,p) model). The distribution over valid colorings is instead specified via a "huge" input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≥ 9Δ, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself.« less
  2. Consider an algorithm performing a computation on a huge random object. Is it necessary to generate the entire object up front, or is it possible to provide query access to the object and sample it incrementally "on-the-fly"? Such an implementation should emulate the object by answering queries in a manner consistent with a random instance sampled from the true distribution. Our first set of results focus on undirected graphs with independent edge probabilities, under certain assumptions. Then, we use this to obtain the first efficient implementations for the Erdos-Renyi model and the Stochastic Block model. As in previous local-access implementationsmore »for random graphs, we support Vertex-Pair and Next-Neighbor queries. We also introduce a new Random-Neighbor query. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (always positive random walks on the integer line). Here, we support Height queries to find the position of the walk, and First-Return queries to find the time when the walk returns to a specified height. This in turn can be used to implement Next-Neighbor queries on random rooted/binary trees, and Matching-Bracket queries on random well bracketed expressions. Finally, we define a new model that: (1) allows multiple independent simultaneous instantiations of the same implementation to be consistent with each other without communication (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with max degree Δ. The distribution over valid colorings is specified via a "huge" underlying graph G, that is far too large to be read in sub-linear time. Instead, we access G through local neighborhood probes. We are able to answer queries to the color of any vertex in sub-linear time for q>9Δ.« less
  3. A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the best-known existential size-stretch trade-off efficiently. Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the input graph, construct the desired spanner, and write the answer to the output tape. However, when considering massive graphs containing millions or even billions of nodes not only the input graph, but also the output spanner might be too large for a singlemore »processor to store. To tackle this challenge, we initiate the study of local computation algorithms (LCAs) for graph spanners in general graphs, where the algorithm should locally decide whether a given edge (u,v)∈E belongs to the output spanner. Such LCAs give the user the `illusion' that a specific sparse spanner for the graph is maintained, without ever fully computing it. We present the following results: -For general n-vertex graphs and r∈{2,3}, there exists an LCA for (2r−1)-spanners with O˜(n1+1/r) edges and sublinear probe complexity of O˜(n1−1/2r). These size/stretch tradeoffs are best possible (up to polylogarithmic factors). -For every k≥1 and n-vertex graph with maximum degree Δ, there exists an LCA for O(k2) spanners with O˜(n1+1/k) edges, probe complexity of O˜(Δ4n2/3), and random seed of size polylog(n). This improves upon, and extends the work of [Lenzen-Levi, 2018]. We also complement our results by providing a polynomial lower bound on the probe complexity of LCAs for graph spanners that holds even for the simpler task of computing a sparse connected subgraph with o(m) edges.« less
  4. In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is {\em monotone} if, for any pair of domain elements x and y such that x⪯y, p(x)≤p(y). To understand the sample complexity of this problem, we introduce a new property called \emph{bigness} over a finite domain, where the distribution is T-big if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/logn) for testing bigness of distributions on domains of size n. We then build on these lowermore »bounds to give Ω(n/logn) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem.« less
  5. We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of m sets over n elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in [17] to the sub-linear query model, that returns an α-approximate cover using Õ(m(n/k)^1/(α–1) + nk) queries to the input, where k denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of k, themore »required number of queries is , even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter k. On the other hand, we show that this bound is not optimal for larger values of the parameter k, as there exists a (1 + ε)-approximation algorithm with Õ(mn/kε^2) queries. We show that this bound is essentially tight for sufficiently small constant ε, by establishing a lower bound of query complexity. Our lower-bound results follow by carefully designing two distributions of instances that are hard to distinguish. In particular, our first lower bound involves a probabilistic construction of a certain set system with a minimum set cover of size αk, with the key property that a small number of “almost uniformly distributed” modifications can reduce the minimum set cover size down to k. Thus, these modifications are not detectable unless a large number of queries are asked. We believe that our probabilistic construction technique might find applications to lower bounds for other combinatorial optimization problems.« less
  6. We study the problem of estimating the value of sums of the form Sp≜∑(xip) when one has the ability to sample xi≥0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1±ε)-multiplicative approximation of Sp has query and time complexities O(mloglognϵ2S1/pp).more »Here, m=∑xi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where Sp≤n, and p=2, our upper bound is O~(n/S1/2p), in contrast to their Ω(n/S1/3p) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case.« less
  7. In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. Nonetheless, these complexities are generally at least exponential in d, the upper bound on the degree of the input graph. Instead, we consider the case where parameter d can be moderately dependent on n, and aim for complexities with subexponential dependence on d, while maintaining polylogarithmic dependence on n. Wemore »present: -a randomized LCA for computing maximal independent sets whose time and space complexities are quasi-polynomial in d and polylogarithmic in n; -for constant ε>0, a randomized LCA that provides a (1−ε)-approximation to maximum matching with high probability, whose time and space complexities are polynomial in d and polylogarithmic in n.« less