skip to main content


Search for: All records

Award ID contains: 1751040

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most $\gamma $ tests; or tests are size-constrained to pool no more than $\rho $ items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require $\varOmega (n)$ storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.

     
    more » « less
  2. Abstract In this paper, we consider the problem of noiseless non-adaptive probabilistic group testing, in which the goal is high-probability recovery of the defective set. We show that in the case of $n$ items among which $k$ are defective, the smallest possible number of tests equals $\min \{ C_{k,n} k \log n, n\}$ up to lower-order asymptotic terms, where $C_{k,n}$ is a uniformly bounded constant (varying depending on the scaling of $k$ with respect to $n$) with a simple explicit expression. The algorithmic upper bound follows from a minor adaptation of an existing analysis of the Definite Defectives algorithm, and the algorithm-independent lower bound builds on existing works for the regimes $k \le n^{1-\varOmega (1)}$ and $k = \varTheta (n)$. In sufficiently sparse regimes (including $k = o\big ( \frac{n}{\log n} \big )$), our main result generalizes that of Coja-Oghlan et al. (2020) by avoiding the assumption $k \le n^{1-\varOmega (1)}$, whereas in sufficiently dense regimes (including $k = \omega \big ( \frac{n}{\log n} \big )$), our main result shows that individual testing is asymptotically optimal for any non-zero target success probability, thus strengthening an existing result of Aldridge (2019, IEEE Trans. Inf. Theory, 65, 2058–2061) in terms of both the error probability and the assumed scaling of $k$. 
    more » « less
  3. Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on m elements from any ϵ-far distribution with 1−δ probability is n=Θ(mlog(1/δ)√ϵ2+log(1/δ)ϵ2), which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to 0 or ∞. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of (1+o(1))mlog(1/δ)√ϵ2 in the regime where this term is dominant, unlike all other existing testers. 
    more » « less
  4. The random order graph streaming model has received significant attention recently, with problems such as matching size estimation, component counting, and the evaluation of bounded degree constant query testable properties shown to admit surprisingly space efficient algorithms. The main result of this paper is a space efficient single pass random order streaming algorithm for simulating nearly independent random walks that start at uniformly random vertices. We show that the distribution of k-step walks from b vertices chosen uniformly at random can be approximated up to error ∊ per walk using  words of space with a single pass over a randomly ordered stream of edges, solving an open problem of Peng and Sohler [SODA '18]. Applications of our result include the estimation of the average return probability of the k-step walk (the trace of the kth power of the random walk matrix) as well as the estimation of PageRank. We complement our algorithm with a strong impossibility result for directed graphs. 
    more » « less
  5. We consider the problem of finding an approximate solution to ℓ1 regression while only observing a small number of labels. Given an n×d unlabeled data matrix X, we must choose a small set of m≪n rows to observe the labels of, then output an estimate βˆ whose error on the original problem is within a 1+ε factor of optimal. We show that sampling from X according to its Lewis weights and outputting the empirical minimizer succeeds with probability 1−δ for m>O(1ε2dlogdεδ). This is analogous to the performance of sampling according to leverage scores for ℓ2 regression, but with exponentially better dependence on δ. We also give a corresponding lower bound of Ω(dε2+(d+1ε2)log1δ). 
    more » « less