skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Title: Inferring microbial co-occurrence networks from amplicon data: a systematic evaluation
ABSTRACT Microbes commonly organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) amplicon profiling provides snapshots that reveal the phylogenies and abundance profiles of these microbial communities. These snapshots, when collected from multiple samples, can reveal the co-occurrence of microbes, providing a glimpse into the network of associations in these communities. However, the inference of networks from 16S data involves numerous steps, each requiring specific tools and parameter choices. Moreover, the extent to which these steps affect the final network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that can convert 16S sequencing data into a network of microbial associations. Through this process, we map how different choices of algorithms and parameters affect the co-occurrence network and identify the steps that contribute substantially to the variance. We further determine the tools and parameters that generate robust co-occurrence networks and develop consensus network algorithms based on benchmarks with mock and synthetic data sets. The Microbial Co-occurrence Network Explorer, or MiCoNE (available athttps://github.com/segrelab/MiCoNE) follows these default tools and parameters and can help explore the outcome of these combinations of choices on the inferred networks. We envisage that this pipeline could be used for integrating multiple data sets and generating comparative analyses and consensus networks that can guide our understanding of microbial community assembly in different biomes. IMPORTANCEMapping the interrelationships between different species in a microbial community is important for understanding and controlling their structure and function. The surge in the high-throughput sequencing of microbial communities has led to the creation of thousands of data sets containing information about microbial abundances. These abundances can be transformed into co-occurrence networks, providing a glimpse into the associations within microbiomes. However, processing these data sets to obtain co-occurrence information relies on several complex steps, each of which involves numerous choices of tools and corresponding parameters. These multiple options pose questions about the robustness and uniqueness of the inferred networks. In this study, we address this workflow and provide a systematic analysis of how these choices of tools affect the final network and guidelines on appropriate tool selection for a particular data set. We also develop a consensus network algorithm that helps generate more robust co-occurrence networks based on benchmark synthetic data sets.  more » « less
Award ID(s):
2019589
PAR ID:
10559568
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Faust, Karoline
Publisher / Repository:
ASM Journals
Date Published:
Journal Name:
mSystems
ISSN:
2379-5077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract <bold>Background</bold>Microorganisms are found in almost every environment, including soil, water, air and inside other organisms, such as animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. Much research has been conducted on the study of microbial communities in various environments and how their interactions and relationships can provide insight into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. These complex microbial communities form intricate ecological networks that are fundamental to ecosystem functioning and host health. Understanding these networks is crucial for developing targeted interventions in both environmental and clinical settings. The emergence of high-throughput sequencing technologies has generated unprecedented amounts of microbiome data, necessitating robust computational methods for network inference and validation. <bold>Results</bold>Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both of which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our method demonstrates superior performance in handling compositional data and addressing the challenges of high dimensionality and sparsity inherent in real microbiome datasets. The proposed framework also provides robust estimates of network stability. <bold>Conclusions</bold>Our empirical study shows that the proposed cross-validation method is useful for hyper-parameter selection (training) and comparing the quality of inferred networks between different algorithms (testing). This advancement represents a significant step forward in microbiome network analysis, providing researchers with a reliable tool for understanding complex microbial interactions. The method’s applicability extends beyond microbiome studies to other fields where network inference from high-dimensional compositional data is crucial, such as gene regulatory networks and ecological food webs. Our framework establishes a new standard for validation in network inference, potentially accelerating discoveries in microbial ecology and human health. 
    more » « less
  2. Abstract The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available athttps://github.com/dyxstat/ViralCC. 
    more » « less
  3. Abstract BackgroundStudying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa–taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of  these taxa-taxa relationships.  Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. ResultsIn this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn’s disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. ConclusionC3NA offers a new microbial data analyses pipeline for refined and enriched taxa–taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation. 
    more » « less
  4. Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at:https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information. 
    more » « less
  5. Abstract Microbial networks offer critical insights into community structure, ecological interactions and host–microbe dynamics. However, constructing reliable microbiome networks remains challenging due to variability among existing inference methods, limited overlap between inferred networks and the absence of a gold standard (a universally accepted reference for benchmarking) for validation.We developedCMiNet, an R package and interactive Shiny App(https://cminet.wid.wisc.edu) that enables consensus microbiome network construction by integrating up to 10 widely used inference algorithms.CMiNetsupports both correlation‐based and conditional dependence‐based methods and provides users with flexible options to construct individual or consensus networks across different approaches.CMiNetintegrates results from multiple inference methods through a voting strategy that retains edges supported by a user‐defined number of methods. To assess robustness, we complement this with a bootstrap analysis that quantifies edge stability under resampling. By jointly reporting method support and bootstrap confidence,CMiNetprovides a reproducible framework that explicitly communicates both agreement across methods and stability under perturbation.We appliedCMiNetto gut and soil microbiome datasets, constructing consensus networks that retained edges supported by multiple methods and confirmed by bootstrap reproducibility values. To identify disease‐associated taxa, we developed an integrative strategy that compared results across machine learning, differential abundance and network‐based approaches, ensuring that selected taxa were consistently recovered across methods. In the soil dataset, this analysis highlighted key taxa such asKtedonobacteria, Acidobacteriae, Vicinamibacteria, MB‐A2‐108, IgnavibacteriaandAnaerolineae, all of which were confirmed by multiple independent strategies. 
    more » « less