We study the problem of estimating the value of sums of the form Sp≜∑(xip) when one has the ability to sample xi≥0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a selfjoin query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of pstars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1±ε)multiplicative approximation of Sp has query and time complexities O(mloglognϵ2S1/pp).more »
Set Cover in Sublinear Time
We study the classic set cover problem from the perspective of sublinear algorithms. Given access to a collection of m sets over n elements in the query model, we show that sublinear algorithms derived from existing techniques have almost tight query complexities.
On one hand, first we show an adaptation of the streaming algorithm presented in [17] to the sublinear query model, that returns an αapproximate cover using Õ(m(n/k)^1/(α–1) + nk) queries to the input, where k denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of k, the required number of queries is , even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter k.
On the other hand, we show that this bound is not optimal for larger values of the parameter k, as there exists a (1 + ε)approximation algorithm with Õ(mn/kε^2) queries. We show that this bound is essentially tight for sufficiently small constant ε, by establishing a lower bound more »
 Publication Date:
 NSFPAR ID:
 10065216
 Journal Name:
 Annual ACMSIAM Symposium on Discrete Algorithms
 Page Range or eLocationID:
 24672486
 Sponsoring Org:
 National Science Foundation
More Like this


We design a nonadaptive algorithm that, given a Boolean function f: {0, 1}^n → {0, 1} which is αfar from monotone, makes poly(n, 1/α) queries and returns an estimate that, with high probability, is an Otilde(\sqrt{n})approximation to the distance of f to monotonicity. Furthermore, we show that for any constant k > 0, approximating the distance to monotonicity up to n^(1/2−k)factor requires 2^{n^k} nonadaptive queries, thereby ruling out a poly(n, 1/α)query nonadaptive algorithm for such approximations. This answers a question of Seshadhri (Property Testing Review, 2014) for the case of nonadaptive algorithms. Approximating the distance to a property is closelymore »

Motivated by the increasing need to understand the distributed algorithmic foundations of largescale graph computations, we study some fundamental graph problems in a messagepassing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many realworld systems. Communication is pointtopoint, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used tomore »

We present a new technique for efficiently removing almost all short cycles in a graph without unintentionally removing its triangles. Consequently, triangle finding problems do not become easy even in almost kcycle free graphs, for any constant k≥ 4. Triangle finding is at the base of many conditional lower bounds in P, mainly for distance computation problems, and the existence of many 4 or 5cycles in a worstcase instance had been the obstacle towards resolving major open questions. Hardness of approximation: Are there distance oracles with m1+o(1) preprocessing time and mo(1) query time that achieve a constant approximation? Existing algorithmsmore »

We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with s servers P1, …, Ps, the ith of which holds a subset A(i) x = b(i) of ni constraints of a linear system in d variables, and the coordinator would like to output an x ϵ ℝd for which A(i) x = b(i) for i = 1, …, s. We assume each coefficient of each constraint is specified using L bits. We first resolve the randomized and deterministic communication complexity inmore »