skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold
Advances in the analysis of amplicon sequence datasets have introduced a methodological shift in how research teams investigate microbial biodiversity, away from sequence identity-based clustering (producing Operational Taxonomic Units, OTUs) to denoising methods (producing amplicon sequence variants, ASVs). While denoising methods have several inherent properties that make them desirable compared to clustering-based methods, questions remain as to the influence that these pipelines have on the ecological patterns being assessed, especially when compared to other methodological choices made when processing data (e.g. rarefaction) and computing diversity indices. We compared the respective influences of two widely used methods, namely DADA2 (a denoising method) vs. Mothur (a clustering method) on 16S rRNA gene amplicon datasets (hypervariable region v4), and compared such effects to the rarefaction of the community table and OTU identity threshold (97% vs. 99%) on the ecological signals detected. We used a dataset comprising freshwater invertebrate (three Unionidae species) gut and environmental (sediment, seston) communities sampled in six rivers in the southeastern USA. We ranked the respective effects of each methodological choice on alpha and beta diversity, and taxonomic composition. The choice of the pipeline significantly influenced alpha and beta diversities and changed the ecological signal detected, especially on presence/absence indices such as the richness index and unweighted Unifrac. Interestingly, the discrepancy between OTU and ASV-based diversity metrics could be attenuated by the use of rarefaction. The identification of major classes and genera also revealed significant discrepancies across pipelines. Compared to the pipeline’s effect, OTU threshold and rarefaction had a minimal impact on all measurements.  more » « less
Award ID(s):
1831531
PAR ID:
10346895
Author(s) / Creator(s):
; ; ;
Editor(s):
Moreno-Hagelsieb, Gabriel
Date Published:
Journal Name:
PLOS ONE
Volume:
17
Issue:
2
ISSN:
1932-6203
Page Range / eLocation ID:
e0264443
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fields, David (Ed.)
    Abstract Community-based diversity analyses, such as metabarcoding, are increasingly popular in the field of metazoan zooplankton community ecology. However, some of the methodological uncertainties remain, such as the potential inflation of diversity estimates resulting from contamination by pseudogene sequences. Furthermore, primer affinity to specific taxonomic groups might skew community composition and structure during PCR. In this study, we estimated OTU (operational taxonomic unit) richness, Shannon’s H’, and the phylum-level community composition of samples from a coastal zooplankton community using four approaches: complement DNA (cDNA) and genomic DNA (gDNA) mitochondrial COI (Cytochrome oxidase subunit I) gene amplicon, metatranscriptome sequencing, and morphological identification. Results of mismatch distribution demonstrated that 90% is good threshold percentage to differentiate intra- and inter-species. Moderate level of correlations appeared upon comparing the species/OTU richness estimated from the different methods. Results strongly indicated that diversity inflation occurred in the samples amplified from gDNA because of mitochondrial pseudogene contamination (overall, gDNA produced two times more richness compared with cDNA amplicons). The unique community compositions observed in the PCR-based methods indicated that taxonomic amplification bias had occurred during the PCR. Therefore, it is recommended that PCR-free approaches be used whenever resolving community structure represents an essential aspect of the analysis. 
    more » « less
  2. 16S rRNA gene profiling (amplicon sequencing) is a popular technique for understanding host-associated and environmental microbial communities. Most protocols for sequencing amplicon libraries follow a standardized pipeline that can differ slightly depending on laboratory facility and user. Given that the same variable region of the 16S gene is targeted, it is generally accepted that sequencing output from differing protocols are comparable and this assumption underlies our ability to identify universal patterns in microbial dynamics through meta-analyses. However, discrepant results from a combined 16S rRNA gene dataset prepared by two labs whose protocols differed only in DNA polymerase and sequencing platform led us to scrutinize the outputs and challenge the idea of confidently combining them for standard microbiome analysis. Using technical replicates of reef-building coral samples from two species, Montipora aequituberculata and Porites lobata , we evaluated the consistency of alpha and beta diversity metrics between data resulting from these highly similar protocols. While we found minimal variation in alpha diversity between platform, significant differences were revealed with most beta diversity metrics, dependent on host species. These inconsistencies persisted following removal of low abundance taxa and when comparing across higher taxonomic levels, suggesting that bacterial community differences associated with sequencing protocol are likely to be context dependent and difficult to correct without extensive validation work. The results of this study encourage caution in the statistical comparison and interpretation of studies that combine rRNA gene sequence data from distinct protocols and point to a need for further work identifying mechanistic causes of these observed differences. 
    more » « less
  3. Recovered microbial community structure is known to be influenced by sample storage conditions and nucleic acid extraction methods, and the impact varies by sample type. Peat soils store a large portion of soil carbon and their microbiomes mediate climate feedbacks. Here, we tested three storage conditions and five extraction protocols on peat soils from three physicochemically distinct habitats in Stordalen Mire, Sweden, revealing significant methodological impacts on microbial (here, meaning bacteria and archaea) community structure. Initial preservation method impacted alpha but not beta diversity, with in-field storage in LifeGuard buffer yielding roughly two-thirds the richness of in-field flash-freezing or transport from the field on ice (all samples were stored at −80 °C after return from the field). Nucleic acid extraction method impacted both alpha and beta diversity; one method (the PowerSoil Total RNA Isolation kit with DNA Elution Accessory kit) diverged from the others (PowerMax Soil DNA Isolation kit-High Humic Acid Protocol, and three variations of a modifiedPowerMax Soil DNA/RNA isolation kit), capturing more diverse microbial taxa, with divergent community structures. Although habitat and sample depth still consistently dominated community variation, method-based biases in microbiome recovery for these climatologically-relevant soils are significant, and underscore the importance of methodological consistency for accurate inter-study comparisons, long-term monitoring, and consistent ecological interpretations. 
    more » « less
  4. Birol, Inanc (Ed.)
    Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. null (Ed.)
    Firmicutes is almost a ubiquitous phylum. Several genera of this group, for instance, Geobacillus, are recognized for decomposing plant organic matter and for producing thermostable ligninolytic enzymes. Amplicon sequencing was used in this study to determine the prevalence and genetic diversity of the Firmicutes in two distinctly related environmental samples—South Dakota Landfill Compost (SDLC, 60 °C), and Sanford Underground Research Facility sediments (SURF, 45 °C). Although distinct microbial community compositions were observed, there was a dominance of Firmicutes in both the SDLC and SURF samples, followed by Proteobacteria. The abundant classes of bacteria in the SDLC site, within the phylum Firmicutes, were Bacilli (83.2%), and Clostridia (2.9%). In comparison, the sample from the SURF mine was dominated by the Clostridia (45.8%) and then Bacilli (20.1%). Within the class Bacilli, the SDLC sample had more diversity (a total of 11 genera with more than 1% operational taxonomic unit, OTU). On the other hand, SURF samples had just three genera, about 1% of the total population: Bacilli, Paenibacillus, and Solibacillus. With specific regard to Geobacillus, it was found to be present at a level of 0.07% and 2.5% in SURF and SDLC, respectively. Subsequently, culture isolations of endospore-forming Firmicutes members from these samples led to the isolation of a total of 117 isolates. According to colony morphologies, and identification based upon 16S rRNA and gyrB gene sequence analysis, we obtained 58 taxonomically distinct strains. Depending on the similarity indexes, a gyrB sequence comparison appeared more useful than 16S rRNA sequence analysis for inferring intra- and some intergeneric relationships between the isolates. 
    more » « less