skip to main content

Title: Non-adaptive vs Adaptive Queries in the Dense Graph Testing Model
We study the relation between the query complexity of adaptive and non-adaptive testers in the dense graph model. It has been known for a couple of decades that the query complexity of non-adaptive testers is at most quadratic in the query complexity of adaptive testers. We show that this general result is essentially tight; that is, there exist graph properties for which any non-adaptive tester must have query complexity that is almost quadratic in the query complexity of the best general (i.e., adaptive) tester. More generally, for every q: N→N such that q(n)≤n−−√ and constant c∈[1,2], we show a graph property that is testable in Θ(q(n)) queries, but its non-adaptive query complexity is Θ(q(n)c), omitting poly(log n) factors and ignoring the effect of the proximity parameter ϵ. Furthermore, the upper bounds hold for one-sided error testers, and are at most quadratic in 1/ϵ. These results are obtained through the use of general reductions that transport properties of ordered structured (like bit strings) to those of unordered structures (like unlabeled graphs). The main features of these reductions are query-efficiency and preservation of distance to the properties. This method was initiated in our prior work (ECCC, TR20-149), and we significantly extend it more » here. « less
Authors:
;
Award ID(s):
1900460
Publication Date:
NSF-PAR ID:
10339955
Journal Name:
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)
Page Range or eLocation-ID:
269 to 275
Sponsoring Org:
National Science Foundation
More Like this
  1. Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on m elements from any ϵ-far distribution with 1−δ probability is n=Θ(mlog(1/δ)√ϵ2+log(1/δ)ϵ2), which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to 0 or ∞. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of (1+o(1))mlog(1/δ)√ϵ2 in the regime where this term is dominant, unlike all other existing testers.
  2. There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1-distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵ-close to uniform from the case where the distribution is (1−ϵ)-far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closenessmore »testing problems leads to the following mixture testing question: Given samples of distributions p,q1,q2, can we test if p is a mixture of q1 and q2? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case.« less
  3. There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1-distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵ-close to uniform from the case where the distribution is (1−ϵ)-far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closenessmore »testing problems leads to the following mixture testing question: Given samples of distributions p,q1,q2, can we test if p is a mixture of q1 and q2? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case.« less
  4. We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ϵ-close to the uniform distribution, in an amortized-efficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of Θ∗(n/ √ m) for sampling a single edge in general graphs (where O ∗(·) suppresses poly(1/ϵ) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O∗(√ q · (n/ √ m)), which is strictly preferable to O∗(q · (n/ √ m)) cost resulting from q invocations ofmore »the algorithm by Eden and Rosenbaum. Subsequent to a preliminary version of this work, Tětek and Thorup (arXiv, preprint) proved that this bound is essentially optimal.« less
  5. Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. We consider a relaxed version of this problem in the setting of local algorithms. The relaxation is that the constructed subgraph is a sparse spanning subgraph containing at most (1+ϵ)n edges (where n is the number of vertices and ϵ is a given approximation/sparsity parameter). In the local setting, the goal is to quickly determine whether a given edge e belongs to such a subgraph, without constructing the whole subgraph, but rather by inspecting (querying) the local neighborhood of e. The challenge is to maintain consistency. That is, to provide answers concerning different edges according to the same spanning subgraph. We first show that for general bounded-degree graphs, the query complexity of any such algorithm must be Ω(n−−√). This lower bound holds for constant-degree graphs that have high expansion. Next we design an algorithm for (bounded-degree) graphs with high expansion, obtaining a result that roughly matches the lower bound. We then turn to study graphs that exclude a fixed minor (and are hence non-expanding). We design an algorithm for such graphs, which may have an unbounded maximum degree. The query complexity of thismore »algorithm is poly(1/ϵ,h) (independent of n and the maximum degree), where h is the number of vertices in the excluded minor. Though our two algorithms are designed for very different types of graphs (and have very different complexities), on a high-level there are several similarities, and we highlight both the similarities and the differences.« less