skip to main content

Title: Optimal estimation of bacterial growth rates based on a permuted monotone matrix
Summary Motivated by the problem of estimating bacterial growth rates for genome assemblies from shotgun metagenomic data, we consider the permuted monotone matrix model $Y=\Theta\Pi+Z$ where $Y\in \mathbb{R}^{n\times p}$ is observed, $\Theta\in \mathbb{R}^{n\times p}$ is an unknown approximately rank-one signal matrix with monotone rows, $\Pi \in \mathbb{R}^{p\times p}$ is an unknown permutation matrix, and $Z\in \mathbb{R}^{n\times p}$ is the noise matrix. In this article we study estimation of the extreme values associated with the signal matrix $\Theta$, including its first and last columns and their difference. Treating these estimation problems as compound decision problems, minimax rate-optimal estimators are constructed using the spectral column-sorting method. Numerical experiments on simulated and synthetic microbiome metagenomic data are conducted, demonstrating the superiority of the proposed methods over existing alternatives. The methods are illustrated by comparing the growth rates of gut bacteria in inflammatory bowel disease patients and control subjects.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Page Range / eLocation ID:
693 to 708
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract The duality principle for group representations developed in Dutkay et al. (J Funct Anal 257:1133–1143, 2009), Han and Larson (Bull Lond Math Soc 40:685–695, 2008) exhibits a fact that the well-known duality principle in Gabor analysis is not an isolated incident but a more general phenomenon residing in the context of group representation theory. There are two other well-known fundamental properties in Gabor analysis: the biorthogonality and the fundamental identity of Gabor analysis. The main purpose of this this paper is to show that these two fundamental properties remain to be true for general projective unitary group representations. Moreover, we also present a general duality theorem which shows that that muti-frame generators meet super-frame generators through a dual commutant pair of group representations. Applying it to the Gabor representations, we obtain that $$\{\pi _{\Lambda }(m, n)g_{1} \oplus \cdots \oplus \pi _{\Lambda }(m, n)g_{k}\}_{m, n \in {\mathbb {Z}}^{d}}$$ { π Λ ( m , n ) g 1 ⊕ ⋯ ⊕ π Λ ( m , n ) g k } m , n ∈ Z d is a frame for $$L^{2}({\mathbb {R}}\,^{d})\oplus \cdots \oplus L^{2}({\mathbb {R}}\,^{d})$$ L 2 ( R d ) ⊕ ⋯ ⊕ L 2 ( R d ) if and only if $$\cup _{i=1}^{k}\{\pi _{\Lambda ^{o}}(m, n)g_{i}\}_{m, n\in {\mathbb {Z}}^{d}}$$ ∪ i = 1 k { π Λ o ( m , n ) g i } m , n ∈ Z d is a Riesz sequence, and $$\cup _{i=1}^{k} \{\pi _{\Lambda }(m, n)g_{i}\}_{m, n\in {\mathbb {Z}}^{d}}$$ ∪ i = 1 k { π Λ ( m , n ) g i } m , n ∈ Z d is a frame for $$L^{2}({\mathbb {R}}\,^{d})$$ L 2 ( R d ) if and only if $$\{\pi _{\Lambda ^{o}}(m, n)g_{1} \oplus \cdots \oplus \pi _{\Lambda ^{o}}(m, n)g_{k}\}_{m, n \in {\mathbb {Z}}^{d}}$$ { π Λ o ( m , n ) g 1 ⊕ ⋯ ⊕ π Λ o ( m , n ) g k } m , n ∈ Z d is a Riesz sequence, where $$\pi _{\Lambda }$$ π Λ and $$\pi _{\Lambda ^{o}}$$ π Λ o is a pair of Gabor representations restricted to a time–frequency lattice $$\Lambda $$ Λ and its adjoint lattice $$\Lambda ^{o}$$ Λ o in $${\mathbb {R}}\,^{d}\times {\mathbb {R}}\,^{d}$$ R d × R d . 
    more » « less
  2. We consider the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $\beta^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $\beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $\beta^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q-$rationality); or irrational numbers taking values in a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-range assumption. Using a novel combination of the Partial Sum of Least Squares (PSLQ) integer relation detection, and the Lenstra-Lenstra-Lov\'asz (LLL) lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $\beta^*\in\mathbb{R}^p$ enjoying the mixed-range assumption, from its linear measurements $Y=X\beta^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement ($n=1$). In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $\beta^*\in\mathbb{R}^p$ enjoying the $Q-$rationality property, from its noisy measurements $Y=X\beta^*+W\in\mathbb{R}^n$, even from a single sample ($n=1$). We further establish that for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample ($n=1$) regime, where the sparsity-based methods such as the Least Absolute Shrinkage and Selection Operator (LASSO) and the Basis Pursuit are known to fail. Furthermore, our results also reveal algorithmic connections between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems. 
    more » « less
  3. We give an integrability criterion on a real-valued non-increasing function $\unicode[STIX]{x1D713}$ guaranteeing that for almost all (or almost no) pairs $(A,\mathbf{b})$ , where $A$ is a real $m\times n$ matrix and $\mathbf{b}\in \mathbb{R}^{m}$ , the system $$\begin{eqnarray}\Vert A\mathbf{q}+\mathbf{b}-\mathbf{p}\Vert ^{m}<\unicode[STIX]{x1D713}(T),\quad \Vert \mathbf{q}\Vert ^{n} more » « less
  4. Abstract

    We study the problem of estimating a $k$-sparse signal ${\boldsymbol \beta }_{0}\in{\mathbb{R}}^{p}$ from a set of noisy observations $\mathbf{y}\in{\mathbb{R}}^{n}$ under the model $\mathbf{y}=\mathbf{X}{\boldsymbol \beta }+w$, where $\mathbf{X}\in{\mathbb{R}}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\boldsymbol \varSigma })$. We consider the class of $L_{q}$-regularized least squares (LQLS) given by the formulation $\hat{{\boldsymbol \beta }}(\lambda )=\text{argmin}_{{\boldsymbol \beta }\in{\mathbb{R}}^{p}}\frac{1}{2}\|\mathbf{y}-\mathbf{X}{\boldsymbol \beta }\|^{2}_{2}+\lambda \|{\boldsymbol \beta }\|_{q}^{q}$, where $\|\cdot \|_{q}$  $(0\le q\le 2)$ denotes the $L_{q}$-norm. In the setting $p,n,k\rightarrow \infty $ with fixed $k/p=\epsilon $ and $n/p=\delta $, we derive the asymptotic risk of $\hat{{\boldsymbol \beta }}(\lambda )$ for arbitrary covariance matrix ${\boldsymbol \varSigma }$ that generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim }N(0,1)$. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of ${\boldsymbol \varSigma }$ in the cases $0\le q\lt 1$ and $1\lt q\le 2,$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ a.k.a. LASSO. To study the influence of the covariance structure of ${\boldsymbol \varSigma }$ on the performance of LQLS in the cases $0\le q\lt 1$ and $1\lt q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.

    more » « less
  5. Abstract

    We study the statistical estimation problem of orthogonal group synchronization and rotation group synchronization. The model is $Y_{ij} = Z_i^* Z_j^{*T} + \sigma W_{ij}\in{\mathbb{R}}^{d\times d}$ where $W_{ij}$ is a Gaussian random matrix and $Z_i^*$ is either an orthogonal matrix or a rotation matrix, and each $Y_{ij}$ is observed independently with probability $p$. We analyze an iterative polar decomposition algorithm for the estimation of $Z^*$ and show it has an error of $(1+o(1))\frac{\sigma ^2 d(d-1)}{2np}$ when initialized by spectral methods. A matching minimax lower bound is further established that leads to the optimality of the proposed algorithm as it achieves the exact minimax risk.

    more » « less