Uniformity testing is one of the most wellstudied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on m elements from any ϵfar distribution with 1−δ probability is n=Θ(mlog(1/δ)√ϵ2+log(1/δ)ϵ2), which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to 0 or ∞. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and nonuniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of (1+o(1))mlog(1/δ)√ϵ2 in the regime where this term is dominant, unlike all other existing testers.
Nonadaptive vs Adaptive Queries in the Dense Graph Testing Model
We study the relation between the query complexity of adaptive and nonadaptive testers in the dense graph model. It has been known for a couple of decades that the query complexity of nonadaptive testers is at most quadratic in the query complexity of adaptive testers. We show that this general result is essentially tight; that is, there exist graph properties for which any nonadaptive tester must have query complexity that is almost quadratic in the query complexity of the best general (i.e., adaptive) tester. More generally, for every q: N→N such that q(n)≤n−−√ and constant c∈[1,2], we show a graph property that is testable in Θ(q(n)) queries, but its nonadaptive query complexity is Θ(q(n)c), omitting poly(log n) factors and ignoring the effect of the proximity parameter ϵ. Furthermore, the upper bounds hold for onesided error testers, and are at most quadratic in 1/ϵ. These results are obtained through the use of general reductions that transport properties of ordered structured (like bit strings) to those of unordered structures (like unlabeled graphs). The main features of these reductions are queryefficiency and preservation of distance to the properties. This method was initiated in our prior work (ECCC, TR20149), and we significantly extend it more »
 Award ID(s):
 1900460
 Publication Date:
 NSFPAR ID:
 10339955
 Journal Name:
 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)
 Page Range or eLocationID:
 269 to 275
 Sponsoring Org:
 National Science Foundation
More Like this


There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵclose to uniform from the case where the distribution is (1−ϵ)far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closenessmore »

There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵclose to uniform from the case where the distribution is (1−ϵ)far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closenessmore »

We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ϵclose to the uniform distribution, in an amortizedefficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of Θ∗(n/ √ m) for sampling a single edge in general graphs (where O ∗(·) suppresses poly(1/ϵ) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized persample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O∗(√ q · (n/ √ m)), which is strictly preferable to O∗(q · (n/ √ m)) cost resulting from q invocations ofmore »

Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. We consider a relaxed version of this problem in the setting of local algorithms. The relaxation is that the constructed subgraph is a sparse spanning subgraph containing at most (1+ϵ)n edges (where n is the number of vertices and ϵ is a given approximation/sparsity parameter). In the local setting, the goal is to quickly determine whether a given edge e belongs to such a subgraph, without constructing the whole subgraph, but rather by inspecting (querying) the local neighborhood of e. The challenge is to maintain consistency. That is, to provide answers concerning different edges according to the same spanning subgraph. We first show that for general boundeddegree graphs, the query complexity of any such algorithm must be Ω(n−−√). This lower bound holds for constantdegree graphs that have high expansion. Next we design an algorithm for (boundeddegree) graphs with high expansion, obtaining a result that roughly matches the lower bound. We then turn to study graphs that exclude a fixed minor (and are hence nonexpanding). We design an algorithm for such graphs, which may have an unbounded maximum degree. The query complexity of thismore »