skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Real-valued Group Testing for Quantitative Molecular Assays
Combinatorial group testing and compressed sensing both focus on recovering a sparse vector of dimensionality n from a much smaller number 𝑚<𝑛 of measurements. In the first approach, the problem is defined over the Boolean field – the goal is to recover a Boolean vector and measurements are Boolean; in the second approach, the unknown vector and the measurements are over the reals. Here, we focus on real-valued group testing setting that more closely fits modern testing protocols relying on quantitative measurements, such as qPCR, where the goal is recovery of a sparse, Boolean vector and the pooling matrix needs to be Boolean and sparse, but the unknown input signal vector and the measurement outcomes are nonnegative reals, and the matrix algebra implied in the test protocol is over the reals. With the recent renewed interest in group testing, focus has been on quantitative measurements resulting from qPCR, but the method proposed for sample pooling were based on matrices designed with Boolean measurements in mind. Here, we investigate constructing pooling matrices dedicated for the real-valued group testing. We provide conditions for pooling matrices to guarantee unambiguous decoding of positives in this setting. We also show a deterministic algorithm for constructing matrices meeting the proposed condition, for small matrix sizes that can be implemented using a laboratory robot. Using simulated data, we show that the proposed approach leads to matrices that can be applied for higher positivity rates than combinatorial group testing matrices considered for viral testing previously. We also validate the approach through wet lab experiments involving SARS-CoV-2 nasopharyngeal swab samples.  more » « less
Award ID(s):
2034995
PAR ID:
10377683
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Pe'er, I.
Date Published:
Journal Name:
Research in Computational Molecular Biology RECOMB 2022
Page Range / eLocation ID:
126-142
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the task of locally correcting, and locally list-correcting, multivariate linear functions over the domain {0,1}n over arbitrary fields and more generally Abelian groups. Such functions form error-correcting codes of relative distance 1/2 and we give local-correction algorithms correcting up to nearly 1/4-fraction errors making O(logn) queries. This query complexity is optimal up to poly(loglogn) factors. We also give local list-correcting algorithms correcting (1/2 − Δ)-fraction errors with OΔ(logn) queries. These results may be viewed as natural generalizations of the classical work of Goldreich and Levin whose work addresses the special case where the underlying group is â„€2. By extending to the case where the underlying group is, say, the reals, we give the first non-trivial locally correctable codes (LCCs) over the reals (with query complexity being sublinear in the dimension (also known as message length)). Previous works in the area mostly focused on the case where the domain is a vector space or a group and this lends to tools that exploit symmetry. Since our domains lack such symmetries, we encounter new challenges whose resolution may be of independent interest. The central challenge in constructing the local corrector is constructing “nearly balanced vectors” over {−1,1}n that span 1n — we show how to construct O(logn) vectors that do so, with entries in each vector summing to ±1. The challenge to the local-list-correction algorithms, given the local corrector, is principally combinatorial, i.e., in proving that the number of linear functions within any Hamming ball of radius (1/2−Δ) is OΔ(1). Getting this general result covering every Abelian group requires integrating a variety of known methods with some new combinatorial ingredients analyzing the structural properties of codewords that lie within small Hamming balls. 
    more » « less
  2. We consider the task of locally correcting, and locally list-correcting, multivariate linear functions over the domain {0,1}n over arbitrary fields and more generally Abelian groups. Such functions form error-correcting codes of relative distance 1/2 and we give local-correction algorithms correcting up to nearly 1/4-fraction errors making O(logn) queries. This query complexity is optimal up to poly(loglogn) factors. We also give local list-correcting algorithms correcting (1/2 − Δ)-fraction errors with OΔ(logn) queries. These results may be viewed as natural generalizations of the classical work of Goldreich and Levin whose work addresses the special case where the underlying group is â„€2. By extending to the case where the underlying group is, say, the reals, we give the first non-trivial locally correctable codes (LCCs) over the reals (with query complexity being sublinear in the dimension (also known as message length)). Previous works in the area mostly focused on the case where the domain is a vector space or a group and this lends to tools that exploit symmetry. Since our domains lack such symmetries, we encounter new challenges whose resolution may be of independent interest. The central challenge in constructing the local corrector is constructing “nearly balanced vectors” over {−1,1}n that span 1n — we show how to construct O(logn) vectors that do so, with entries in each vector summing to ±1. The challenge to the local-list-correction algorithms, given the local corrector, is principally combinatorial, i.e., in proving that the number of linear functions within any Hamming ball of radius (1/2−Δ) is OΔ(1). Getting this general result covering every Abelian group requires integrating a variety of known methods with some new combinatorial ingredients analyzing the structural properties of codewords that lie within small Hamming balls. 
    more » « less
  3. Abstract Can one recover a matrix efficiently from only matrix‐vector products? If so, how many are needed? This article describes algorithms to recover matrices with known structures, such as tridiagonal, Toeplitz, Toeplitz‐like, and hierarchical low‐rank, from matrix‐vector products. In particular, we derive a randomized algorithm for recovering an unknown hierarchical low‐rank matrix from only matrix‐vector products with high probability, where is the rank of the off‐diagonal blocks, and is a small oversampling parameter. We do this by carefully constructing randomized input vectors for our matrix‐vector products that exploit the hierarchical structure of the matrix. While existing algorithms for hierarchical matrix recovery use a recursive “peeling” procedure based on elimination, our approach uses a recursive projection procedure. 
    more » « less
  4. Irregular data structures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of Sparse Matrix-Vector multiply (SpMV). But in this work we propose instead to take a fundamentally different approach: to automatically build sets of regular sub-computations by mining for regular sub-regions in the irregular data structure. Our approach leads to code that is specialized to the sparsity structure of the input matrix, but which does not need anymore any indirection array, thereby improving SIMD vectorizability. We particularly focus on small sparse structures (below 10M nonzeros), and demonstrate substantial performance improvements and compaction capabilities compared to a classical CSR implementation and Intel MKL IE's SpMV implementation, evaluating on 200+ different matrices from the SuiteSparse repository. 
    more » « less
  5. The distance matrix of a dataset X of n points with respect to a distance function f represents all pairwise distances between points in X induced by f. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the l1 metric for which we get a linear runtime, as well as a quadratic lower bound for any algorithm which computes a matrix-vector product for the l_infty case. Our upper bound results have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by l1 and l2 functions and the fastest algorithm for computing an additive error lowrank approximation for the l2 metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate l2 distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma. 
    more » « less