Abstract A critical challenge in microbiome data analysis is the existence of many non-biological zeros, which distort taxon abundance distributions, complicate data analysis, and jeopardize the reliability of scientific discoveries. To address this issue, we propose the first imputation method for microbiome data—mbImpute—to identify and recover likely non-biological zeros by borrowing information jointly from similar samples, similar taxa, and optional metadata including sample covariates and taxon phylogeny. We demonstrate that mbImpute improves the power of identifying disease-related taxa from microbiome data of type 2 diabetes and colorectal cancer, and mbImpute preserves non-zero distributions of taxa abundances.
more »
« less
Statistics or biology: the zero-inflation controversy about scRNA-seq data
Abstract Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
more »
« less
- PAR ID:
- 10379037
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Genome Biology
- Volume:
- 23
- Issue:
- 1
- ISSN:
- 1474-760X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary In many applications of two‐component mixture models such as the popular zero‐inflated model for discrete‐valued data, it is customary for the data analyst to evaluate the inherent heterogeneity in view of observed data. To this end, the score test, acclaimed for its simplicity, is routinely performed. It has long been recognised that this test may behave erratically under model misspecification, but the implications of this behaviour remain poorly understood for popular two‐component mixture models. For the special case of zero‐inflated count models, we use data simulations and theoretical arguments to evaluate this behaviour and discuss its implications in settings where the working model is restrictive with regard to the true data‐generating mechanism. We enrich this discussion with an analysis of count data in HIV research, where a one‐component model is shown to fit the data reasonably well despite apparent extra zeros. These results suggest that a rejection of homogeneity does not imply that the underlying mixture model is appropriate. Rather, such a rejection simply implies that the mixture model should be carefully interpreted in the light of potential model misspecifications, and further evaluated against other competing models.more » « less
-
Li, Yue (Ed.)Microbiome sequencing data normalization is crucial for eliminating technical bias and ensuring accurate downstream analysis. However, this process can be challenging due to the high frequency of zero counts in microbiome data. We propose a novel reference-based normalization method called normalization via rank similarity (RSim) that corrects sample-specific biases, even in the presence of many zero counts. Unlike other normalization methods, RSim does not require additional assumptions or treatments for the high prevalence of zero counts. This makes it robust and minimizes potential bias resulting from procedures that address zero counts, such as pseudo-counts. Our numerical experiments demonstrate that RSim reduces false discoveries, improves detection power, and reveals true biological signals in downstream tasks such as PCoA plotting, association analysis, and differential abundance analysis.more » « less
-
Abstract Quantum phase transitions are a fascinating area of condensed matter physics. The extension through complexification not only broadens the scope of this field but also offers a new framework for understanding criticality and its statistical implications. This mini review provides a concise overview of recent developments in complexification, primarily covering finite temperature and equilibrium quantum phase transitions, as well as their connection with dynamical quantum phase transitions and non-Hermitian physics, with a particular focus on the significance of Fisher zeros. Starting from the newly discovered self-similarity phenomenon associated with complex partition functions, we further discuss research on self-similar systems briefly. Finally, we offer a perspective on these aspects.more » « less
-
ABSTRACT Biological anthropologists have long engaged in qualitative data analysis (QDA), though such work is not always foregrounded. In this article, we discuss the role of rigorous and systematic QDA in biological anthropology and consider how it can be understood and advanced. We first establish what kinds of qualitative data and analysis are used in biological anthropology. We then review the ways QDA has been used in six subfields of biological anthropology: primatology, human biology, paleoanthropology, dental and skeletal biology, bioarchaeology, and anthropological genetics. We follow that with an overview of how to use QDA methods: three simple QDA methods (i.e., word‐based analysis, theme analysis, and coding) and three QDA approaches for model‐building and model‐testing (i.e., content analysis, semantic network analysis, and grounded theory). With this foundation in place, we discuss how QDA can support transformative research in biological anthropology—emphasizing the valuable role of QDA in inductive and community‐based research. We discuss how QDA supports transformative research using mixed‐methods research designs, participatory action research, and abolition and Black feminist research. Finally, we consider how to close a QDA project, reflecting on the logistics, ethics, and limitations of qualitative data sharing, including how researchers can use the CARE Principles (Collective Benefit, Authority to Control, Responsibility, and Ethics) to support Indigenous data sovereignty.more » « less
An official website of the United States government
