NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Phylogenetic association analysis with conditional rank correlation

https://doi.org/10.1093/biomet/asad075

Wang, Shulei; Yuan, Bo; Tony_Cai, T.; Li, Hongzhe (December 2023, Biometrika)

Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Hence, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This article introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. The proposed tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using weighted sum and maximum approaches to capture both dense and sparse signals. The significance level of the test statistics is determined by calibration through a nearest-neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when these are available. The practical advantages of the proposed framework are demonstrated through numerical experiments using both simulated and real microbiome datasets.
more » « less
Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer

https://doi.org/10.1080/01621459.2023.2184373

Li, Sai; Zhang, Linjun; Cai, T. Tony; Li, Hongzhe (April 2023, Journal of the American Statistical Association)

Full Text Available
Inference for High-Dimensional Linear Mixed-Effects Models: A Quasi-Likelihood Approach

https://doi.org/10.1080/01621459.2021.1888740

Li, Sai; Cai, T. Tony; Li, Hongzhe (October 2022, Journal of the American Statistical Association)

Full Text Available
Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics

https://doi.org/10.5705/ss.202019.0445

Ma, Rong; Cai, T. Tony; Li, Hongzhe (January 2022, Statistica Sinica)

Full Text Available
Optimal Permutation Recovery in Permuted Monotone Matrix Model

https://doi.org/10.1080/01621459.2020.1713794

Ma, Rong; Tony Cai, T.; Li, Hongzhe (July 2021, Journal of the American Statistical Association)

Full Text Available
Optimal Estimation of Wasserstein Distance on a Tree With an Application to Microbiome Studies

https://doi.org/10.1080/01621459.2019.1699422

Wang, Shulei; Cai, T. Tony; Li, Hongzhe (July 2021, Journal of the American Statistical Association)

Full Text Available
Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models

https://doi.org/10.1080/01621459.2019.1699421

Ma, Rong; Tony Cai, T.; Li, Hongzhe (April 2021, Journal of the American Statistical Association)
null (Ed.)
Full Text Available
Optimal Structured Principal Subspace Estimation: Metric Entropy and Minimax Rates

Cai, T. Tony; Li, Hongzhe; Ma, Rong (January 2021, Journal of machine learning research)
null (Ed.)
Driven by a wide range of applications, several principal subspace estimation problems have been studied individually under different structural constraints. This paper presents a uni- fied framework for the statistical analysis of a general structured principal subspace estima- tion problem which includes as special cases sparse PCA/SVD, non-negative PCA/SVD, subspace constrained PCA/SVD, and spectral clustering. General minimax lower and up- per bounds are established to characterize the interplay between the information-geometric complexity of the constraint set for the principal subspaces, the signal-to-noise ratio (SNR), and the dimensionality. The results yield interesting phase transition phenomena concern- ing the rates of convergence as a function of the SNRs and the fundamental limit for consistent estimation. Applying the general results to the specific settings yields the mini- max rates of convergence for those problems, including the previous unknown optimal rates for sparse SVD, non-negative PCA/SVD and subspace constrained PCA/SVD.
more » « less
Full Text Available
Optimal estimation of bacterial growth rates based on a permuted monotone matrix

https://doi.org/10.1093/biomet/asaa082

Ma, Rong; Cai, T Tony; Li, Hongzhe (October 2020, Biometrika)

Summary Motivated by the problem of estimating bacterial growth rates for genome assemblies from shotgun metagenomic data, we consider the permuted monotone matrix model $$Y=\Theta\Pi+Z$$ where $$Y\in \mathbb{R}^{n\times p}$$ is observed, $$\Theta\in \mathbb{R}^{n\times p}$$ is an unknown approximately rank-one signal matrix with monotone rows, $$\Pi \in \mathbb{R}^{p\times p}$$ is an unknown permutation matrix, and $$Z\in \mathbb{R}^{n\times p}$$ is the noise matrix. In this article we study estimation of the extreme values associated with the signal matrix $$\Theta$$, including its first and last columns and their difference. Treating these estimation problems as compound decision problems, minimax rate-optimal estimators are constructed using the spectral column-sorting method. Numerical experiments on simulated and synthetic microbiome metagenomic data are conducted, demonstrating the superiority of the proposed methods over existing alternatives. The methods are illustrated by comparing the growth rates of gut bacteria in inflammatory bowel disease patients and control subjects.
more » « less
Full Text Available
Hypothesis testing for phylogenetic composition: a minimum-cost flow perspective

https://doi.org/10.1093/biomet/asaa061

Wang, Shulei; Cai, T Tony; Li, Hongzhe (July 2020, Biometrika)

Summary Quantitative comparison of microbial composition from different populations is a fundamental task in various microbiome studies. We consider two-sample testing for microbial compositional data by leveraging phylogenetic information. Motivated by existing phylogenetic distances, we take a minimum-cost flow perspective to study such testing problems. We first show that multivariate analysis of variance with permutation using phylogenetic distances, one of the most commonly used methods in practice, is essentially a sum-of-squares type of test and has better power for dense alternatives. However, empirical evidence from real datasets suggests that the phylogenetic microbial composition difference between two populations is usually sparse. Motivated by this observation, we propose a new maximum type test, detector of active flow on a tree, and investigate its properties. We show that the proposed method is particularly powerful against sparse phylogenetic composition difference and enjoys certain optimality. The practical merit of the proposed method is demonstrated by simulation studies and an application to a human intestinal biopsy microbiome dataset on patients with ulcerative colitis.
more » « less
Full Text Available

« Prev Next »

Search for: All records