skip to main content

This content will become publicly available on February 24, 2023

Title: Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold
Advances in the analysis of amplicon sequence datasets have introduced a methodological shift in how research teams investigate microbial biodiversity, away from sequence identity-based clustering (producing Operational Taxonomic Units, OTUs) to denoising methods (producing amplicon sequence variants, ASVs). While denoising methods have several inherent properties that make them desirable compared to clustering-based methods, questions remain as to the influence that these pipelines have on the ecological patterns being assessed, especially when compared to other methodological choices made when processing data (e.g. rarefaction) and computing diversity indices. We compared the respective influences of two widely used methods, namely DADA2 (a denoising method) vs. Mothur (a clustering method) on 16S rRNA gene amplicon datasets (hypervariable region v4), and compared such effects to the rarefaction of the community table and OTU identity threshold (97% vs. 99%) on the ecological signals detected. We used a dataset comprising freshwater invertebrate (three Unionidae species) gut and environmental (sediment, seston) communities sampled in six rivers in the southeastern USA. We ranked the respective effects of each methodological choice on alpha and beta diversity, and taxonomic composition. The choice of the pipeline significantly influenced alpha and beta diversities and changed the ecological signal detected, especially on presence/absence indices more » such as the richness index and unweighted Unifrac. Interestingly, the discrepancy between OTU and ASV-based diversity metrics could be attenuated by the use of rarefaction. The identification of major classes and genera also revealed significant discrepancies across pipelines. Compared to the pipeline’s effect, OTU threshold and rarefaction had a minimal impact on all measurements. « less
Authors:
; ; ;
Editors:
Moreno-Hagelsieb, Gabriel
Award ID(s):
1831531
Publication Date:
NSF-PAR ID:
10346895
Journal Name:
PLOS ONE
Volume:
17
Issue:
2
Page Range or eLocation-ID:
e0264443
ISSN:
1932-6203
Sponsoring Org:
National Science Foundation
More Like this
  1. Fields, David (Ed.)
    Abstract Community-based diversity analyses, such as metabarcoding, are increasingly popular in the field of metazoan zooplankton community ecology. However, some of the methodological uncertainties remain, such as the potential inflation of diversity estimates resulting from contamination by pseudogene sequences. Furthermore, primer affinity to specific taxonomic groups might skew community composition and structure during PCR. In this study, we estimated OTU (operational taxonomic unit) richness, Shannon’s H’, and the phylum-level community composition of samples from a coastal zooplankton community using four approaches: complement DNA (cDNA) and genomic DNA (gDNA) mitochondrial COI (Cytochrome oxidase subunit I) gene amplicon, metatranscriptome sequencing, and morphological identification. Results of mismatch distribution demonstrated that 90% is good threshold percentage to differentiate intra- and inter-species. Moderate level of correlations appeared upon comparing the species/OTU richness estimated from the different methods. Results strongly indicated that diversity inflation occurred in the samples amplified from gDNA because of mitochondrial pseudogene contamination (overall, gDNA produced two times more richness compared with cDNA amplicons). The unique community compositions observed in the PCR-based methods indicated that taxonomic amplification bias had occurred during the PCR. Therefore, it is recommended that PCR-free approaches be used whenever resolving community structure represents an essential aspect of the analysis.
  2. Abstract

    Biodiversity is changing at an accelerating rate at both local and regional scales. Beta diversity, which quantifies species turnover between these two scales, is emerging as a key driver of ecosystem function that can inform spatial conservation. Yet measuring biodiversity remains a major challenge, especially in aquatic ecosystems. Decoding environmental DNA (eDNA) left behind by organisms offers the possibility of detecting species sans direct observation, a Rosetta Stone for biodiversity. While eDNA has proven useful to illuminate diversity in aquatic ecosystems, its utility for measuring beta diversity over spatial scales small enough to be relevant to conservation purposes is poorly known. Here we tested how eDNA performs relative to underwater visual census (UVC) to evaluate beta diversity of marine communities. We paired UVC with 12S eDNA metabarcoding and used a spatially structured hierarchical sampling design to assess key spatial metrics of fish communities on temperate rocky reefs in southern California. eDNA provided a more-detailed picture of the main sources of spatial variation in both taxonomic richness and community turnover, which primarily arose due to strong species filtering within and among rocky reefs. As expected, eDNA detected more taxa at the regional scale (69 vs. 38) which accumulated quickly withmore »space and plateaued at only ~ 11 samples. Conversely, the discovery rate of new taxa was slower with no sign of saturation for UVC. Based on historical records in the region (2000–2018) we found that 6.9 times more UVC samples would be required to detect 50 taxa compared to eDNA. Our results show that eDNA metabarcoding can outperform diver counts to capture the spatial patterns in biodiversity at fine scales with less field effort and more power than traditional methods, supporting the notion that eDNA is a critical scientific tool for detecting biodiversity changes in aquatic ecosystems.

    « less
  3. Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing.more »Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems.« less
  4. 16S rRNA gene profiling (amplicon sequencing) is a popular technique for understanding host-associated and environmental microbial communities. Most protocols for sequencing amplicon libraries follow a standardized pipeline that can differ slightly depending on laboratory facility and user. Given that the same variable region of the 16S gene is targeted, it is generally accepted that sequencing output from differing protocols are comparable and this assumption underlies our ability to identify universal patterns in microbial dynamics through meta-analyses. However, discrepant results from a combined 16S rRNA gene dataset prepared by two labs whose protocols differed only in DNA polymerase and sequencing platform led us to scrutinize the outputs and challenge the idea of confidently combining them for standard microbiome analysis. Using technical replicates of reef-building coral samples from two species, Montipora aequituberculata and Porites lobata , we evaluated the consistency of alpha and beta diversity metrics between data resulting from these highly similar protocols. While we found minimal variation in alpha diversity between platform, significant differences were revealed with most beta diversity metrics, dependent on host species. These inconsistencies persisted following removal of low abundance taxa and when comparing across higher taxonomic levels, suggesting that bacterial community differences associated with sequencing protocolmore »are likely to be context dependent and difficult to correct without extensive validation work. The results of this study encourage caution in the statistical comparison and interpretation of studies that combine rRNA gene sequence data from distinct protocols and point to a need for further work identifying mechanistic causes of these observed differences.« less
  5. null (Ed.)
    Firmicutes is almost a ubiquitous phylum. Several genera of this group, for instance, Geobacillus, are recognized for decomposing plant organic matter and for producing thermostable ligninolytic enzymes. Amplicon sequencing was used in this study to determine the prevalence and genetic diversity of the Firmicutes in two distinctly related environmental samples—South Dakota Landfill Compost (SDLC, 60 °C), and Sanford Underground Research Facility sediments (SURF, 45 °C). Although distinct microbial community compositions were observed, there was a dominance of Firmicutes in both the SDLC and SURF samples, followed by Proteobacteria. The abundant classes of bacteria in the SDLC site, within the phylum Firmicutes, were Bacilli (83.2%), and Clostridia (2.9%). In comparison, the sample from the SURF mine was dominated by the Clostridia (45.8%) and then Bacilli (20.1%). Within the class Bacilli, the SDLC sample had more diversity (a total of 11 genera with more than 1% operational taxonomic unit, OTU). On the other hand, SURF samples had just three genera, about 1% of the total population: Bacilli, Paenibacillus, and Solibacillus. With specific regard to Geobacillus, it was found to be present at a level of 0.07% and 2.5% in SURF and SDLC, respectively. Subsequently, culture isolations of endospore-forming Firmicutes members from thesemore »samples led to the isolation of a total of 117 isolates. According to colony morphologies, and identification based upon 16S rRNA and gyrB gene sequence analysis, we obtained 58 taxonomically distinct strains. Depending on the similarity indexes, a gyrB sequence comparison appeared more useful than 16S rRNA sequence analysis for inferring intra- and some intergeneric relationships between the isolates.« less