skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2026

Title: Missing mate pairs reported as poly-G reads can confound analyses of rare members of microbial assemblages
Using sequence reads from shotgun metagenomic analyses in both cattle and sheep, we describe how failures in mate pairing on Illumina sequencing can interact with bioinformatics pipelines to give spurious patterns among rare components of a metagenomic sample. We identified several different shotgun metagenomic datasets from different animals and different laboratories where the two members of the read pair matched a viral database at very different frequencies. We traced this bias to a set of poly-G reads of high quality that resulted from failures in generating read pairs during library preparation. These results reinforce the need to remove poly-G-rich reads when quality filtering shotgun metagenomic data.  more » « less
Award ID(s):
2241312
PAR ID:
10659598
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Ecological Genetics and Genomics
Date Published:
Journal Name:
Ecological Genetics and Genomics
Volume:
37
Issue:
C
ISSN:
2405-9854
Page Range / eLocation ID:
100399
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Shotgun sequencing is routinely employed to study bacteria in microbial communities. With the vast amount of shotgun sequencing reads generated in a metagenomic project, it is crucial to determine the microbial composition at the strain level. This study investigated 20 computational tools that attempt to infer bacterial strain genomes from shotgun reads. For the first time, we discussed the methodology behind these tools. We also systematically evaluated six novel-strain-targeting tools on the same datasets and found that BHap, mixtureS and StrainFinder performed better than other tools. Because the performance of the best tools is still suboptimal, we discussed future directions that may address the limitations. 
    more » « less
  2. Abstract MotivationMetagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample. ResultsWe develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs. Availability and implementationHiFine is available at https://github.com/dyxstat/HiFine. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. Background Despite more than 60 years of research, the etiology of bacterial vaginosis (BV) remains controversial. In this pilot study, we used shotgun metagenomic sequencing to characterize vaginal microbial community changes before the development of incident BV (iBV). Methods A cohort of African American women with a baseline healthy vaginal microbiome (no Amsel criteria, Nugent score 0–3 with no Gardnerella vaginalis morphotypes) were followed for 90 days with daily self-collected vaginal specimens for iBV (≥2 consecutive days of a Nugent score of 7–10). Shotgun metagenomic sequencing was performed on select vaginal specimens from 4 women, every other day for 12 days before iBV diagnosis. Sequencing data were analyzed through Kraken2 and bioBakery 3 workflows, and specimens were classified into community state types. Quantitative polymerase chain reaction was performed to compare the correlation of read counts with bacterial abundance. Results Common BV-associated bacteria such as G. vaginalis , Prevotella bivia , and Fannyhessea vaginae were increasingly identified in the participants before iBV. Linear modeling indicated significant increases in G. vaginalis and F . vaginae relative abundance before iBV, whereas the relative abundance of Lactobacillus species declined over time. The Lactobacillus species decline correlated with the presence of Lactobacillus phages. We observed enrichment in bacterial adhesion factor genes on days before iBV. There were also significant correlations between bacterial read counts and abundances measured by quantitative polymerase chain reaction. Conclusions This pilot study characterizes vaginal community dynamics before iBV and identifies key bacterial taxa and mechanisms potentially involved in the pathogenesis of iBV. 
    more » « less
  4. The largest dataset of soil metagenomes has recently been released by the National Ecological Observatory Network (NEON), which performs annual shotgun sequencing of soils at 47 sites across the United States. NEON serves as a valuable educational resource, thanks to its open data and programming tutorials, but there is currently no introductory tutorial for accessing and analyzing the soil shotgun metagenomic dataset. Here, we describe methods for processing raw soil metagenome sequencing reads using a bioinformatics pipeline tailored to the high complexity and diversity of the soil microbiome. We describe the rationale, necessary resources, and implementation of steps such as cleaning raw reads, taxonomic classification, assembly into contigs or genomes, annotation of predicted genes using custom protein databases, and exporting data for downstream analysis. The workflow presented here aims to increase the accessibility of NEON’s shotgun metagenome data, which can provide important clues about soil microbial communities and their ecological roles. 
    more » « less
  5. Metagenomics has enabled sequencing of viral communities from a myriad of different environments. Viral metagenomic studies routinely uncover sequences with no recognizable homology to known coding regions or genomes. Nevertheless, complete viral genomes have been constructed directly from complex community metagenomes, often through tedious manual curation. To address this, we developed the software tool virMine to identify viral genomes from raw reads representative of viral or mixed (viral and bacterial) communities. virMine automates sequence read quality control, assembly, and annotation. Researchers can easily refine their search for a specific study system and/or feature(s) of interest. In contrast to other viral genome detection tools that often rely on the recognition of viral signature sequences, virMine is not restricted by the insufficient representation of viral diversity in public data repositories. Rather, viral genomes are identified through an iterative approach, first omitting non-viral sequences. Thus, both relatives of previously characterized viruses and novel species can be detected, including both eukaryotic viruses and bacteriophages. Here we present virMine and its analysis of synthetic communities as well as metagenomic data sets from three distinctly different environments: the gut microbiota, the urinary microbiota, and freshwater viromes. Several new viral genomes were identified and annotated, thus contributing to our understanding of viral genetic diversity in these three environments. 
    more » « less