skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing
ABSTRACT Targeted amplicon sequencing is widely used in microbial ecology studies. However, sequencing artifacts and amplification biases are of great concern. To identify sources of these artifacts, a systematic analysis was performed using mock communities comprised of 16S rRNA genes from 33 bacterial strains. Our results indicated that while sequencing errors were generally isolated to low-abundance operational taxonomic units, chimeric sequences were a major source of artifacts. Singleton and doubleton sequences were primarily chimeras. Formation of chimeric sequences was significantly correlated with the GC content of the targeted sequences. Low-GC-content mock community members exhibited lower rates of chimeric sequence formation. GC content also had a large impact on sequence recovery. The quantitative capacity was notably limited, with substantial recovery variations and weak correlation between anticipated and observed strain abundances. The mock community strains with higher GC content had higher recovery rates than strains with lower GC content. Amplification bias was also observed due to the differences in primer affinity. A two-step PCR strategy reduced the number of chimeric sequences by half. In addition, comparative analyses based on the mock communities showed that several widely used sequence processing pipelines/methods, including DADA2, Deblur, UCLUST, UNOISE, and UPARSE, had different advantages and disadvantages in artifact removal and rare species detection. These results are important for improving sequencing quality and reliability and developing new algorithms to process targeted amplicon sequences. IMPORTANCEAmplicon sequencing of targeted genes is the predominant approach to estimate the membership and structure of microbial communities. However, accurate reconstruction of community composition is difficult due to sequencing errors, and other methodological biases and effective approaches to overcome these challenges are essential. Using a mock community of 33 phylogenetically diverse strains, this study evaluated the effect of GC content on sequencing results and tested different approaches to improve overall sequencing accuracy while characterizing the pros and cons of popular amplicon sequence data processing approaches. The sequencing results from this study can serve as a benchmarking data set for future algorithmic improvements. Furthermore, the new insights on sequencing error, chimera formation, and GC bias from this study will help enhance the quality of amplicon sequencing studies and support the development of new data analysis approaches.  more » « less
Award ID(s):
2129235 2025558
PAR ID:
10541601
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Editor(s):
Gilbert, Jack A
Publisher / Repository:
American Society for Microbiology
Date Published:
Journal Name:
mSystems
Volume:
8
Issue:
6
ISSN:
2379-5077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract 16S rRNA targeted amplicon sequencing is an established standard for elucidating microbial community composition. While high‐throughput short‐read sequencing can elicit only a portion of the 16S rRNA gene due to their limited read length, third generation sequencing can read the 16S rRNA gene in its entirety and thus provide more precise taxonomic classification. Here, we present a protocol for generating full‐length 16S rRNA sequences with Oxford Nanopore Technologies (ONT) and a microbial community profile with Emu. We select Emu for analyzing ONT sequences as it leverages information from the entire community to overcome errors due to incomplete reference databases and hardware limitations to ultimately obtain species‐level resolution. This pipeline provides a low‐cost solution for characterizing microbiome composition by exploiting real‐time, long‐read ONT sequencing and tailored software for accurate characterization of microbial communities. © 2024 Wiley Periodicals LLC. Basic Protocol: Microbial community profiling with Emu Support Protocol 1: Full‐length 16S rRNA microbial sequences with Oxford Nanopore Technologies sequencing platform Support Protocol 2: Building a custom reference database for Emu 
    more » « less
  2. Summary Universal primers for SSU rRNA genes allow profiling of natural communities by simultaneously amplifying templates from Bacteria, Archaea, and Eukaryota in a single PCR reaction. Despite the potential to show relative abundance for all rRNA genes, universal primers are rarely used, due to various concerns including amplicon length variation and its effect on bioinformatic pipelines. We thus developed 16S and 18S rRNA mock communities and a bioinformatic pipeline to validate this approach. Using these mocks, we show that universal primers (515Y/926R) outperformed eukaryote‐specific V4 primers in observed versus expected abundance correlations (slope = 0.88 vs. 0.67–0.79), and mock community members with single mismatches to the primer were strongly underestimated (threefold to eightfold). Using field samples, both primers yielded similar 18S beta‐diversity patterns (Mantel test,p < 0.001) but differences in relative proportions of many rarer taxa. To test for length biases, we mixed mock communities (16S + 18S) before PCR and found a twofold underestimation of 18S sequences due to sequencing bias. Correcting for the twofold underestimation, we estimate that, in Southern California field samples (1.2–80 μm), there were averages of 35% 18S, 28% chloroplast 16S, and 37% prokaryote 16S rRNA genes. These data demonstrate the potential for universal primers to generate comprehensive microbiome profiles. 
    more » « less
  3. Abstract BackgroundIn light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems. Current barcoding protocols often rely on short amplicon sequences, which yield accurate identification of biological entities in a community but provide limited phylogenetic resolution across broad taxonomic scales. However, the phylogenetic structure of communities is an essential component of biodiversity. Consequently, a barcoding approach is required that unites robust taxonomic assignment power and high phylogenetic utility. A possible solution is offered by sequencing long ribosomal DNA (rDNA) amplicons on the MinION platform (Oxford Nanopore Technologies). FindingsUsing a dataset of various animal and plant species, with a focus on arthropods, we assemble a pipeline for long rDNA barcode analysis and introduce a new software (MiniBar) to demultiplex dual indexed Nanopore reads. We find excellent phylogenetic and taxonomic resolution offered by long rDNA sequences across broad taxonomic scales. We highlight the simplicity of our approach by field barcoding with a miniaturized, mobile laboratory in a remote rainforest. We also test the utility of long rDNA amplicons for analysis of community diversity through metabarcoding and find that they recover highly skewed diversity estimates. ConclusionsSequencing dual indexed, long rDNA amplicons on the MinION platform is a straightforward, cost-effective, portable, and universal approach for eukaryote DNA barcoding. Although bulk community analyses using long-amplicon approaches may introduce biases, the long rDNA amplicons approach signifies a powerful tool for enabling the accurate recovery of taxonomic and phylogenetic diversity across biological communities. 
    more » « less
  4. Members of the fungal genusMorchellaare widely known for their important ecological roles and significant economic value. In this study, we used amplicon and genome sequencing to characterize bacterial communities associated with sexual fruiting bodies from wild specimens, as well as vegetative mycelium and sclerotia obtained fromMorchellaisolates grownin vitro. These investigations included diverse representatives from both Elata and EsculentaMorchellaclades. Unique bacterial community compositions were observed across the various structures examined, both within and across individualMorchellaisolates or specimens. However, specific bacterial taxa were frequently detected in association with certain structures, providing support for an associated core bacterial community. Bacteria from the genusPseudomonasandRalstoniaconstituted the core bacterial associates ofMorchellamycelia and sclerotia, while other genera (e.g.,Pedobacterspp.,Deviosaspp., andBradyrhizobiumspp.) constituted the core bacterial community of fruiting bodies. Furthermore, the importance ofPseudomonasas a key member of the bacteriome was supported by the isolation of severalPseudomonasstrains from mycelia duringin vitrocultivation. Four of the six mycelial-derivedPseudomonasisolates shared 16S rDNA sequence identity with amplicon sequences recovered directly from the examined fungal structures. Distinct interaction phenotypes (antagonistic or neutral) were observed in confrontation assays between these bacteria and variousMorchellaisolates. Genome sequences obtained from thesePseudomonasisolates revealed intriguing differences in gene content and annotated functions, specifically with respect to toxin-antitoxin systems, cell adhesion, chitinases, and insecticidal toxins. These genetic differences correlated with the interaction phenotypes. This study provides evidence thatPseudomonasspp. are frequently associated withMorchellaand these associations may greatly impact fungal physiology. 
    more » « less
  5. Although plant microbiome assembly involves a series of both plant–microbe and microbe–microbe interactions, the latter is less often directly tested. Here, we investigate a role for Streptomyces strains to influence assembly of other bacteria into root microbiomes through the use of two synthetic communities (SynComs): a 21-member community including four Streptomyces strains and a 17-member community lacking those Streptomyces strains. Following inoculation with these SynComs on wild-type Arabidopsis thaliana Col-0, differential abundance modeling on root endosphere 16S ribosomal RNA gene amplicon sequencing data revealed altered abundance of four diverse SynCom members: Arthrobacter sp. 131, Agrobacterium sp. 33, Burkholderia sp. CL11, and Ralstonia sp. CL21. Modeling results were tested by seedling coinoculation experiments with the four Streptomyces strains and differentially abundant members, which confirmed the predicted decreased abundance for Arthrobacter sp. 131, Agrobacterium sp. 33, and Ralstonia sp. CL21 when Streptomyces strains were present. We further characterized how the phytohormone salicylic acid (SA) mediates Streptomyces strains’ influence over Agrobacterium sp. 33 and Burkholderia sp. CL11 seedling colonization. Although decreased colonization of Ralstonia sp. CL21 and Arthrobacter sp. 131 when Streptomyces spp. are present were not influenced by SA, direct antibiosis of Arthrobacter sp. 131 by Streptomyces was observed. These results highlight a role for Streptomyces-mediated microbial interactions during plant root microbiome assembly as well as distinct mechanisms that mediate them. Understanding the role of microbial interactions during microbiome assembly will inform the production of beneficial microbial treatments for use in agricultural fields. 
    more » « less