skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fast splitting algorithms for sparsity-constrained and noisy group testing
Abstract In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most $$\gamma $$ tests; or tests are size-constrained to pool no more than $$\rho $$ items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require $$\varOmega (n)$$ storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.  more » « less
Award ID(s):
1751040
PAR ID:
10390233
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
12
Issue:
2
ISSN:
2049-8772
Format(s):
Medium: X Size: p. 1141-1171
Size(s):
p. 1141-1171
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this paper, we consider the problem of noiseless non-adaptive probabilistic group testing, in which the goal is high-probability recovery of the defective set. We show that in the case of $$n$$ items among which $$k$$ are defective, the smallest possible number of tests equals $$\min \{ C_{k,n} k \log n, n\}$$ up to lower-order asymptotic terms, where $$C_{k,n}$$ is a uniformly bounded constant (varying depending on the scaling of $$k$$ with respect to $$n$$) with a simple explicit expression. The algorithmic upper bound follows from a minor adaptation of an existing analysis of the Definite Defectives algorithm, and the algorithm-independent lower bound builds on existing works for the regimes $$k \le n^{1-\varOmega (1)}$$ and $$k = \varTheta (n)$$. In sufficiently sparse regimes (including $$k = o\big ( \frac{n}{\log n} \big )$$), our main result generalizes that of Coja-Oghlan et al. (2020) by avoiding the assumption $$k \le n^{1-\varOmega (1)}$$, whereas in sufficiently dense regimes (including $$k = \omega \big ( \frac{n}{\log n} \big )$$), our main result shows that individual testing is asymptotically optimal for any non-zero target success probability, thus strengthening an existing result of Aldridge (2019, IEEE Trans. Inf. Theory, 65, 2058–2061) in terms of both the error probability and the assumed scaling of $$k$$. 
    more » « less
  2. In network tomography, one goal is to identify a small set of failed links in a network using as little information as possible. One way of setting up this problem is called graph-constrained group testing. Graph-constrained group testing is a variant of the classical combinatorial group testing problem, where the tests that one is allowed are additionally constrained by a graph. In this case, the graph is given by the underlying network topology. The main contribution of this work is to show that for most graphs, the constraints imposed by the graph are no constraint at all. That is, the number of tests required to identify the failed links in graph-constrained group testing is near-optimal even for the corresponding group testing problem with no graph constraints. Our approach is based on a simple randomized construction of tests. To analyze our construction, we prove new results about the size of giant components in randomly sparsified graphs. Finally, we provide empirical results which suggest that our connected-subgraph tests perform better not just in theory but also in practice, and in particular perform better on a real-world network topology. 
    more » « less
  3. Nonadaptive Stochastic Score Classification Sequential testing problems involve a system with several components, each of which is working with some independent probability. The working/failed status of each component can be determined by performing a test, which is usually expensive. So, the goal is to perform tests in a carefully chosen sequence until the overall system status can be evaluated. These problems arise in a variety of applications, such as healthcare, manufacturing, and telecommunication. A common task in these applications is to categorize the system into one of several classes that correspond to the system status being poor, fair, good, excellent, etc. In “Nonadaptive Stochastic Score Classification and Explainable Half-Space Evaluation,” Ghuge, Gupta, and Nagarajan provide the first constant-factor approximation algorithm for this problem. Moreover, the resulting policy is nonadaptive, which results in significant savings in computational time. The authors also validate their theoretical results via computational experiments, where they observe that their algorithm’s cost is on average at most 50% more than an information-theoretic lower bound. 
    more » « less
  4. Random-based approaches and heuristics are commonly used in kernel concurrency testing due to the massive scale of modern kernels and corresponding interleaving space. The lack of accurate and scalable approaches to analyze concurrent kernel executions makes existing testing approaches heavily rely on expensive dynamic executions to measure the effectiveness of a new test. Unfortunately, the high cost incurred by dynamic executions limits the breadth of the exploration and puts latency pressure on finding effective concurrent test inputs and schedules, hindering the overall testing effectiveness. This paper proposes Snowcat, a kernel concurrency testing framework that generates effective test inputs and schedules using a learned kernel block-coverage predictor. Using a graph neural network, the coverage predictor takes a concurrent test input and scheduling hints and outputs a prediction on whether certain important code blocks will be executed. Using this predictor, Snowcat can skip concurrent tests that are likely to be fruitless and prioritize the promising ones for actual dynamic execution. After testing the Linux kernel for over a week, Snowcat finds ~17% more potential data races, by prioritizing tests of more fruitful schedules than existing work would have chosen. Snowcat can also find effective test inputs that expose new concurrency bugs with higher probability (1.4×~2.6×), or reproduce known bugs more quickly (15×) than state-of-art testing tools. More importantly, Snowcat is shown to be more efficient at reaching a desirable level of race coverage in the continuous setting, as the Linux kernel evolves from version to version. In total, Snowcat discovered 17 new concurrency bugs in Linux kernel 6.1, of which 13 are confirmed and 6 are fixed. 
    more » « less
  5. Regression testing is increasingly important with the wide use of continuous integration. A desirable requirement for regression testing is that a test failure reliably indicates a problem in the code under test and not a false alarm from the test code or the testing infrastructure. However, some test failures are unreliable, stemming from flaky tests that can non- deterministically pass or fail for the same code under test. There are many types of flaky tests, with order-dependent tests being a prominent type. To help advance research on flaky tests, we present (1) a framework, iDFlakies, to detect and partially classify flaky tests; (2) a dataset of flaky tests in open-source projects; and (3) a study with our dataset. iDFlakies automates experimentation with our tool for Maven-based Java projects. Using iDFlakies, we build a dataset of 422 flaky tests, with 50.5% order-dependent and 49.5% not. Our study of these flaky tests finds the prevalence of two types of flaky tests, probability of a test-suite run to have at least one failure due to flaky tests, and how different test reorderings affect the number of detected flaky tests. We envision that our work can spur research to alleviate the problem of flaky tests. 
    more » « less