Summary Differential abundance tests for compositional data are essential and fundamental in various biomedical applications, such as single-cell, bulk RNA-seq and microbiome data analysis. However, because of the compositional constraint and the prevalence of zero counts in the data, differential abundance analysis on compositional data remains a complicated and unsolved statistical problem. This article proposes a new differential abundance test, the robust differential abundance test, to address these challenges. Compared with existing methods, the robust differential abundance test is simple and computationally efficient, is robust to prevalent zero counts in compositional datasets, can take the data’s compositional nature into account, and has a theoretical guarantee of controlling false discoveries in a general setting. Furthermore, in the presence of observed covariates, the robust differential abundance test can work with covariate-balancing techniques to remove potential confounding effects and draw reliable conclusions. The proposed test is applied to several numerical examples, and its merits are demonstrated using both simulated and real datasets.
more »
« less
Multiscale adaptive differential abundance analysis in microbial compositional data
Abstract Motivation Differential abundance analysis is an essential and commonly used tool to characterize the difference between microbial communities. However, identifying differentially abundant microbes remains a challenging problem because the observed microbiome data are inherently compositional, excessive sparse, and distorted by experimental bias. Besides these major challenges, the results of differential abundance analysis also depend largely on the choice of analysis unit, adding another practical complexity to this already complicated problem. Results In this work, we introduce a new differential abundance test called the MsRDB test, which embeds the sequences into a metric space and integrates a multiscale adaptive strategy for utilizing spatial structure to identify differentially abundant microbes. Compared with existing methods, the MsRDB test can detect differentially abundant microbes at the finest resolution offered by data and provide adequate detection power while being robust to zero counts, compositional effect, and experimental bias in the microbial compositional dataset. Applications to both simulated and real microbial compositional datasets demonstrate the usefulness of the MsRDB test. Availability and implementation All analyses can be found under https://github.com/lakerwsl/MsRDB-Manuscript-Code.
more »
« less
- Award ID(s):
- 2113458
- PAR ID:
- 10418647
- Editor(s):
- Alkan, Can
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 39
- Issue:
- 4
- ISSN:
- 1367-4811
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.more » « less
-
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject’s gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with myalgic encephalomyelitis/chronic fatigue syndrome and through simulation studies.more » « less
-
Abstract BackgroundStudying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa–taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of these taxa-taxa relationships. Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. ResultsIn this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn’s disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. ConclusionC3NA offers a new microbial data analyses pipeline for refined and enriched taxa–taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation.more » « less
-
Srivastava, Abhishek (Ed.)IntroductionMarine particles form in the ocean surface sink through the water column into the deep ocean, sequestering carbon. Microorganisms inhabit and consume carbon in these particles. The East Pacific Rise (EPR) harbors both an Oxygen Deficient Zone (ODZ) and a non-buoyant plume region formed from hydrothermal vents located on the ocean floor, allowing us to explore relationships between microbial community and particle size between a range of environments. MethodsIn this study, we quantified microbial diversity using a fractionation method that separated particles into seven fine scale fractions (0.2–1.2, 1.2–5, 5–20, 20–53, 53–180,180–500, >500 μm), and included a spike-in standard for sequencing the 16S rRNA gene. Size fractionated organic carbon into the same fractions enabled the calculation of bacterial 16S rRNA copies per μg C and per liter. ResultsThere was a large increase in the bacterial 16S rRNA copies/ug C and copies/L on particles >180 μm between the upper water column and the deep water column. Though the total concentration of organic C in particles decreased in the deep water column, the density of bacteria on large particles increased at depth. The microbial community varied statistically significantly as a function of particle size and depth. Quantitative abundance estimates found that ostensibly obligate free-living microbes, such as SAR11 and Thaumarcheota, were more abundant in the free-living fraction but also common and abundant in the particulate size fractions. Conversely, ostensibly obligate particle attached bacteria such as members of Bacteroidetes and Planctomycetes, while most abundant on particles, were also present in the free living fraction. Total bacterial abundance, and the abundance of many taxonomic groups, increased in the ODZ region, particularly in the free-living fraction. Contrastingly, in the non-buoyant plume, there were highly abundant bacteria in the 5–20 and 20–53 μm fractions but reduced bacteria present in the 53–180 and 180–500 μm fractions. ConclusionQuantitative examination of microbial communities highlights the distribution of microbial taxa unburdened by compositional effects. These data are congruent with existing models which suggest high levels of exchange between particle-attached and free-living communities.more » « less
An official website of the United States government

