Abstract Microbial networks offer critical insights into community structure, ecological interactions and host–microbe dynamics. However, constructing reliable microbiome networks remains challenging due to variability among existing inference methods, limited overlap between inferred networks and the absence of a gold standard (a universally accepted reference for benchmarking) for validation.We developedCMiNet, an R package and interactive Shiny App(https://cminet.wid.wisc.edu) that enables consensus microbiome network construction by integrating up to 10 widely used inference algorithms.CMiNetsupports both correlation‐based and conditional dependence‐based methods and provides users with flexible options to construct individual or consensus networks across different approaches.CMiNetintegrates results from multiple inference methods through a voting strategy that retains edges supported by a user‐defined number of methods. To assess robustness, we complement this with a bootstrap analysis that quantifies edge stability under resampling. By jointly reporting method support and bootstrap confidence,CMiNetprovides a reproducible framework that explicitly communicates both agreement across methods and stability under perturbation.We appliedCMiNetto gut and soil microbiome datasets, constructing consensus networks that retained edges supported by multiple methods and confirmed by bootstrap reproducibility values. To identify disease‐associated taxa, we developed an integrative strategy that compared results across machine learning, differential abundance and network‐based approaches, ensuring that selected taxa were consistently recovered across methods. In the soil dataset, this analysis highlighted key taxa such asKtedonobacteria, Acidobacteriae, Vicinamibacteria, MB‐A2‐108, IgnavibacteriaandAnaerolineae, all of which were confirmed by multiple independent strategies.
more »
« less
C3NA: correlation and consensus-based cross-taxonomy network analysis for compositional microbial data
Abstract BackgroundStudying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa–taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of these taxa-taxa relationships. Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. ResultsIn this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn’s disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. ConclusionC3NA offers a new microbial data analyses pipeline for refined and enriched taxa–taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation.
more »
« less
- Award ID(s):
- 2133504
- PAR ID:
- 10530602
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- BMC Bioinformatics
- Volume:
- 23
- Issue:
- 1
- ISSN:
- 1471-2105
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundA common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, some differential methods control for either DNA or taxa abundances to address their strong correlation with RNA abundance. However, it remains unknown if both factors need to be controlled for simultaneously. ResultsWe discovered that when either DNA or taxa abundance is controlled for, RNA abundance still has a strong partial correlation with the other factor. In both simulation studies and a real data analysis, we demonstrated that controlling for both DNA and taxa abundances leads to superior performance compared to only controlling for one factor. ConclusionsTo fully address the confounding effects in analyzing metatranscriptomics data, both DNA and taxa abundances need to be controlled for in the differential analysis.more » « less
-
Faust, Karoline (Ed.)ABSTRACT Microbes commonly organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) amplicon profiling provides snapshots that reveal the phylogenies and abundance profiles of these microbial communities. These snapshots, when collected from multiple samples, can reveal the co-occurrence of microbes, providing a glimpse into the network of associations in these communities. However, the inference of networks from 16S data involves numerous steps, each requiring specific tools and parameter choices. Moreover, the extent to which these steps affect the final network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that can convert 16S sequencing data into a network of microbial associations. Through this process, we map how different choices of algorithms and parameters affect the co-occurrence network and identify the steps that contribute substantially to the variance. We further determine the tools and parameters that generate robust co-occurrence networks and develop consensus network algorithms based on benchmarks with mock and synthetic data sets. The Microbial Co-occurrence Network Explorer, or MiCoNE (available athttps://github.com/segrelab/MiCoNE) follows these default tools and parameters and can help explore the outcome of these combinations of choices on the inferred networks. We envisage that this pipeline could be used for integrating multiple data sets and generating comparative analyses and consensus networks that can guide our understanding of microbial community assembly in different biomes. IMPORTANCEMapping the interrelationships between different species in a microbial community is important for understanding and controlling their structure and function. The surge in the high-throughput sequencing of microbial communities has led to the creation of thousands of data sets containing information about microbial abundances. These abundances can be transformed into co-occurrence networks, providing a glimpse into the associations within microbiomes. However, processing these data sets to obtain co-occurrence information relies on several complex steps, each of which involves numerous choices of tools and corresponding parameters. These multiple options pose questions about the robustness and uniqueness of the inferred networks. In this study, we address this workflow and provide a systematic analysis of how these choices of tools affect the final network and guidelines on appropriate tool selection for a particular data set. We also develop a consensus network algorithm that helps generate more robust co-occurrence networks based on benchmark synthetic data sets.more » « less
-
ABSTRACT AimBeta diversity quantifies the similarity of ecological assemblages. Its increase, known as biotic homogenisation, can be a consequence of biological invasions. However, species occurrence (presence/absence) and abundance‐based analyses can produce contradictory assessments of the magnitude and direction of changes in beta diversity. Previous work indicates these contradictions should be less frequent in nature than in theory, but a growing number of empirical studies report discrepancies between occurrence‐ and abundance‐based approaches. Understanding if these discrepancies represent a few isolated cases or are systematic across a diversity of ecosystems would allow us to better understand the general patterns, mechanisms and impacts of biotic homogenisation. LocationUnited States. Time Period1963–2020. Major Taxa StudiedVascular plants. MethodsWe used a dataset of more than 70,000 vegetation survey plots to assess differences in biotic homogenisation with and without invasion using both occurrence‐ and abundance‐based metrics of beta diversity. We estimated taxonomic biotic homogenisation by comparing beta diversity of invaded and uninvaded plots with both classes of metrics and investigated the characteristics of the non‐native species pool that influenced the likelihood that these metrics disagree. ResultsIn 78% of plot comparisons, occurrence‐ and abundance‐based calculations agreed in direction, and the two metrics were generally well correlated. Our empirical results are consistent with previous theory. Discrepancies between the metrics were more likely when the same non‐native species was at high cover at both plots compared for beta diversity, and when these plots were spatially distant. Main ConclusionsIn about 20% of cases, our calculations revealed differences in direction (homogenisation vs. differentiation) when comparing occurrence‐ and abundance‐based metrics, indicating that the metrics are not interchangeable, especially when distances between plots are high and invader diversity is low. When data permit, combining the two approaches can offer insights into the role of invasions and extirpations in driving biotic homogenisation/differentiation.more » « less
-
Abstract <bold>Background</bold>Microorganisms are found in almost every environment, including soil, water, air and inside other organisms, such as animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. Much research has been conducted on the study of microbial communities in various environments and how their interactions and relationships can provide insight into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. These complex microbial communities form intricate ecological networks that are fundamental to ecosystem functioning and host health. Understanding these networks is crucial for developing targeted interventions in both environmental and clinical settings. The emergence of high-throughput sequencing technologies has generated unprecedented amounts of microbiome data, necessitating robust computational methods for network inference and validation. <bold>Results</bold>Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both of which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our method demonstrates superior performance in handling compositional data and addressing the challenges of high dimensionality and sparsity inherent in real microbiome datasets. The proposed framework also provides robust estimates of network stability. <bold>Conclusions</bold>Our empirical study shows that the proposed cross-validation method is useful for hyper-parameter selection (training) and comparing the quality of inferred networks between different algorithms (testing). This advancement represents a significant step forward in microbiome network analysis, providing researchers with a reliable tool for understanding complex microbial interactions. The method’s applicability extends beyond microbiome studies to other fields where network inference from high-dimensional compositional data is crucial, such as gene regulatory networks and ecological food webs. Our framework establishes a new standard for validation in network inference, potentially accelerating discoveries in microbial ecology and human health.more » « less
An official website of the United States government

