Abstract Linking sequence-derived microbial taxa abundances to host (patho-)physiology or habitat characteristics in a reproducible and interpretable manner has remained a formidable challenge for the analysis of microbiome survey data. Here, we introduce a flexible probabilistic modeling framework, VI-MIDAS (variational inference for microbiome survey data analysis), that enables joint estimation of context-dependent drivers and broad patterns of associations of microbial taxon abundances from microbiome survey data. VI-MIDAS comprises mechanisms for direct coupling of taxon abundances with covariates and taxa-specific latent coupling, which can incorporate spatio-temporal information and taxon–taxon interactions. We leverage mean-field variational inference for posterior VI-MIDAS model parameter estimation and illustrate model building and analysis using Tara Ocean Expedition survey data. Using VI-MIDAS’ latent embedding model and tools from network analysis, we show that marine microbial communities can be broadly categorized into five modules, including SAR11-, nitrosopumilus-, and alteromondales-dominated communities, each associated with specific environmental and spatiotemporal signatures. VI-MIDAS also finds evidence for largely positive taxon–taxon associations in SAR11 or Rhodospirillales clades, and negative associations with Alteromonadales and Flavobacteriales classes. Our results indicate that VI-MIDAS provides a powerful integrative statistical analysis framework for discovering broad patterns of associations between microbial taxa and context-specific covariate data from microbiome survey data.
more »
« less
mbImpute: an accurate and robust imputation method for microbiome data
Abstract A critical challenge in microbiome data analysis is the existence of many non-biological zeros, which distort taxon abundance distributions, complicate data analysis, and jeopardize the reliability of scientific discoveries. To address this issue, we propose the first imputation method for microbiome data—mbImpute—to identify and recover likely non-biological zeros by borrowing information jointly from similar samples, similar taxa, and optional metadata including sample covariates and taxon phylogeny. We demonstrate that mbImpute improves the power of identifying disease-related taxa from microbiome data of type 2 diabetes and colorectal cancer, and mbImpute preserves non-zero distributions of taxa abundances.
more »
« less
- Award ID(s):
- 1846216
- PAR ID:
- 10253270
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Genome Biology
- Volume:
- 22
- Issue:
- 1
- ISSN:
- 1474-760X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.more » « less
-
Rudi, Knut (Ed.)ABSTRACT Many animals contain a species-rich and diverse gut microbiota that likely contributes to several host-supportive services that include diet processing and nutrient provisioning. Loss of microbiome taxa and their associated metabolic functions as result of perturbations may result in loss of microbiome-level services and reduction of metabolic capacity. If metabolic functions are shared by multiple taxa (i.e., functional redundancy), including deeply divergent lineages, then the impact of taxon/function losses may be dampened. We examined to what degree alterations in phylotype diversity impact microbiome-level metabolic capacity. Feeding two nutritionally imbalanced diets to omnivorousPeriplaneta americanaover 8 weeks reduced the diversity of their phylotype-rich gut microbiomes by ~25% based on 16S rRNA gene amplicon sequencing, yet PICRUSt2-inferred metabolic pathway richness was largely unaffected due to their being polyphyletic. We concluded that the nonlinearity between taxon and metabolic functional losses is due to microbiome members sharing many well-characterized metabolic functions, with lineages remaining after perturbation potentially being capable of preventing microbiome “service outages” due to functional redundancy. IMPORTANCEDiet can affect gut microbiome taxonomic composition and diversity, but its impacts on community-level functional capabilities are less clear. Host health and fitness are increasingly being linked to microbiome composition and further modeling of the relationship between microbiome taxonomic and metabolic functional capability is needed to inform these linkages. Invertebrate animal models like the omnivorous American cockroach are ideal for this inquiry because they are amenable to various diets and provide high replicates per treatment at low costs and thus enabling rigorous statistical analyses and hypothesis testing. Microbiome taxonomic composition is diet-labile and diversity was reduced after feeding on unbalanced diets (i.e., post-treatment), but the predicted functional capacities of the post-treatment microbiomes were less affected likely due to the resilience of several abundant taxa surviving the perturbation as well as many metabolic functions being shared by several taxa. These results suggest that both taxonomic and functional profiles should be considered when attempting to infer how perturbations are altering gut microbiome services and possible host outcomes.more » « less
-
Abstract Climate warming has increased permafrost thaw in arctic tundra and extended the duration of annual thaw (number of thaw days in summer) along soil profiles. Predicting the microbial response to permafrost thaw depends largely on knowing how increased thaw duration affects the composition of the soil microbiome. Here, we determined soil microbiome composition from the annually thawed surface active layer down through permafrost from two tundra types at each of three sites on the North Slope of Alaska, USA. Variations in soil microbial taxa were found between sites up to ~90 km apart, between tundra types, and between soil depths. Microbiome differences at a site were greatest across transitions from thawed to permafrost depths. Results from correlation analysis based on multi‐decadal thaw surveys show that differences in thaw duration by depth were significantly, positively correlated with the abundance of dominant taxa in the active layer and negatively correlated with dominant taxa in the permafrost. Microbiome composition within the transition zone was statistically similar to that in the permafrost, indicating that recent decades of intermittent thaw have not yet induced a shift from permafrost to active‐layer microbes. We suggest that thaw duration rather than thaw frequency has a greater impact on the composition of microbial taxa within arctic soils.more » « less
-
There has been a growing number of datasets exhibiting an excess of zero values that cannot be adequately modeled using standard probability distributions. For example, microbiome data and single-cell RNA sequencing data consist of count measurements in which the proportion of zeros exceeds what can be captured by standard distributions such as the Poisson or negative binomial, while also requiring appropriate modeling of the nonzero counts. Several models have been proposed to address zero-inflated datasets including the zero-inflated negative binomial, hurdle negative binomial model, and the truncated latent Gaussian copula model. This study aims to compare various models and determine which one performs optimally under different conditions using both simulation studies and real data analyses. We are particularly interested in investigating how dependence among the variables, level of zeroinflation or deflation, and variance of the data affects model selection. KEYWORDS: Zero-InflatedModels; HurdleModels; Truncated Latent Gaussian CopulaModel; Microbiome Data; Gene-Sequencing Data; Zero-Inflation, Negative Binomial; Zero-Deflationmore » « less
An official website of the United States government
