Abstract Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
more »
« less
mbImpute: an accurate and robust imputation method for microbiome data
Abstract A critical challenge in microbiome data analysis is the existence of many non-biological zeros, which distort taxon abundance distributions, complicate data analysis, and jeopardize the reliability of scientific discoveries. To address this issue, we propose the first imputation method for microbiome data—mbImpute—to identify and recover likely non-biological zeros by borrowing information jointly from similar samples, similar taxa, and optional metadata including sample covariates and taxon phylogeny. We demonstrate that mbImpute improves the power of identifying disease-related taxa from microbiome data of type 2 diabetes and colorectal cancer, and mbImpute preserves non-zero distributions of taxa abundances.
more »
« less
- Award ID(s):
- 1846216
- PAR ID:
- 10253270
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Genome Biology
- Volume:
- 22
- Issue:
- 1
- ISSN:
- 1474-760X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Rudi, Knut (Ed.)ABSTRACT Many animals contain a species-rich and diverse gut microbiota that likely contributes to several host-supportive services that include diet processing and nutrient provisioning. Loss of microbiome taxa and their associated metabolic functions as result of perturbations may result in loss of microbiome-level services and reduction of metabolic capacity. If metabolic functions are shared by multiple taxa (i.e., functional redundancy), including deeply divergent lineages, then the impact of taxon/function losses may be dampened. We examined to what degree alterations in phylotype diversity impact microbiome-level metabolic capacity. Feeding two nutritionally imbalanced diets to omnivorousPeriplaneta americanaover 8 weeks reduced the diversity of their phylotype-rich gut microbiomes by ~25% based on 16S rRNA gene amplicon sequencing, yet PICRUSt2-inferred metabolic pathway richness was largely unaffected due to their being polyphyletic. We concluded that the nonlinearity between taxon and metabolic functional losses is due to microbiome members sharing many well-characterized metabolic functions, with lineages remaining after perturbation potentially being capable of preventing microbiome “service outages” due to functional redundancy. IMPORTANCEDiet can affect gut microbiome taxonomic composition and diversity, but its impacts on community-level functional capabilities are less clear. Host health and fitness are increasingly being linked to microbiome composition and further modeling of the relationship between microbiome taxonomic and metabolic functional capability is needed to inform these linkages. Invertebrate animal models like the omnivorous American cockroach are ideal for this inquiry because they are amenable to various diets and provide high replicates per treatment at low costs and thus enabling rigorous statistical analyses and hypothesis testing. Microbiome taxonomic composition is diet-labile and diversity was reduced after feeding on unbalanced diets (i.e., post-treatment), but the predicted functional capacities of the post-treatment microbiomes were less affected likely due to the resilience of several abundant taxa surviving the perturbation as well as many metabolic functions being shared by several taxa. These results suggest that both taxonomic and functional profiles should be considered when attempting to infer how perturbations are altering gut microbiome services and possible host outcomes.more » « less
-
Abstract Climate warming has increased permafrost thaw in arctic tundra and extended the duration of annual thaw (number of thaw days in summer) along soil profiles. Predicting the microbial response to permafrost thaw depends largely on knowing how increased thaw duration affects the composition of the soil microbiome. Here, we determined soil microbiome composition from the annually thawed surface active layer down through permafrost from two tundra types at each of three sites on the North Slope of Alaska, USA. Variations in soil microbial taxa were found between sites up to ~90 km apart, between tundra types, and between soil depths. Microbiome differences at a site were greatest across transitions from thawed to permafrost depths. Results from correlation analysis based on multi‐decadal thaw surveys show that differences in thaw duration by depth were significantly, positively correlated with the abundance of dominant taxa in the active layer and negatively correlated with dominant taxa in the permafrost. Microbiome composition within the transition zone was statistically similar to that in the permafrost, indicating that recent decades of intermittent thaw have not yet induced a shift from permafrost to active‐layer microbes. We suggest that thaw duration rather than thaw frequency has a greater impact on the composition of microbial taxa within arctic soils.more » « less
-
There has been a growing number of datasets exhibiting an excess of zero values that cannot be adequately modeled using standard probability distributions. For example, microbiome data and single-cell RNA sequencing data consist of count measurements in which the proportion of zeros exceeds what can be captured by standard distributions such as the Poisson or negative binomial, while also requiring appropriate modeling of the nonzero counts. Several models have been proposed to address zero-inflated datasets including the zero-inflated negative binomial, hurdle negative binomial model, and the truncated latent Gaussian copula model. This study aims to compare various models and determine which one performs optimally under different conditions using both simulation studies and real data analyses. We are particularly interested in investigating how dependence among the variables, level of zeroinflation or deflation, and variance of the data affects model selection. KEYWORDS: Zero-InflatedModels; HurdleModels; Truncated Latent Gaussian CopulaModel; Microbiome Data; Gene-Sequencing Data; Zero-Inflation, Negative Binomial; Zero-Deflationmore » « less
-
Abstract A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine‐mapping of the microbiome, we propose a two‐step compositional knockoff filter to provide the effective finite‐sample false discovery rate (FDR) control in high‐dimensional linear log‐contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum‐to‐zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high‐dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two‐step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions.more » « less
An official website of the United States government
