Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Abstract In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsityconstrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most $\gamma $ tests; or tests are sizeconstrained to pool no more than $\rho $ items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the foreach recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its nearoptimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require $\varOmega (n)$ storage for each algorithm, we also provide lowstorage variants based on hashing, with similar recovery guarantees.

S. Koyejo ; S. Mohamed ; A. Agarwal ; D. Belgrave ; K. Cho ; A. Oh (Ed.)

Uniformity testing is one of the most wellstudied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on m elements from any ϵfar distribution with 1−δ probability is n=Θ(mlog(1/δ)√ϵ2+log(1/δ)ϵ2), which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to 0 or ∞. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and nonuniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of (1+o(1))mlog(1/δ)√ϵ2 in the regime where this term is dominant, unlike all other existing testers.more » « less

Abstract In this paper, we consider the problem of noiseless nonadaptive probabilistic group testing, in which the goal is highprobability recovery of the defective set. We show that in the case of $n$ items among which $k$ are defective, the smallest possible number of tests equals $\min \{ C_{k,n} k \log n, n\}$ up to lowerorder asymptotic terms, where $C_{k,n}$ is a uniformly bounded constant (varying depending on the scaling of $k$ with respect to $n$) with a simple explicit expression. The algorithmic upper bound follows from a minor adaptation of an existing analysis of the Definite Defectives algorithm, and the algorithmindependent lower bound builds on existing works for the regimes $k \le n^{1\varOmega (1)}$ and $k = \varTheta (n)$. In sufficiently sparse regimes (including $k = o\big ( \frac{n}{\log n} \big )$), our main result generalizes that of CojaOghlan et al. (2020) by avoiding the assumption $k \le n^{1\varOmega (1)}$, whereas in sufficiently dense regimes (including $k = \omega \big ( \frac{n}{\log n} \big )$), our main result shows that individual testing is asymptotically optimal for any nonzero target success probability, thus strengthening an existing result of Aldridge (2019, IEEE Trans. Inf. Theory, 65, 2058–2061) in terms of both the error probability and the assumed scaling of $k$.more » « less

We consider the problem of finding an approximate solution to ℓ1 regression while only observing a small number of labels. Given an n×d unlabeled data matrix X, we must choose a small set of m≪n rows to observe the labels of, then output an estimate βˆ whose error on the original problem is within a 1+ε factor of optimal. We show that sampling from X according to its Lewis weights and outputting the empirical minimizer succeeds with probability 1−δ for m>O(1ε2dlogdεδ). This is analogous to the performance of sampling according to leverage scores for ℓ2 regression, but with exponentially better dependence on δ. We also give a corresponding lower bound of Ω(dε2+(d+1ε2)log1δ).more » « less