skip to main content


Title: Pitfalls and pointers: An accessible guide to marker gene amplicon sequencing in ecological applications
Abstract

Next‐Generation Sequencing (NGS) is a powerful tool that has been rapidly adopted by many ecologists studying microbial communities. Despite the exciting demonstration of NGS technology as a tool for ecological research, cryptic pitfalls inherent to its use can obscure correct interpretation of NGS data. Here, we provide an accessible overview of a NGS process that uses marker gene amplicon sequences (MGAS) that will allow scientists, particularly community ecologists, to make appropriate methodological choices and understand limits on inference about community composition and diversity that can be drawn from MGAS data.

We describe the MGAS pipeline, focusing specifically on cryptic sources of variation that have received less emphasis in the ecological literature, but which may substantially impact inference about microbial community diversity and composition. By simulating communities from published microbiome data, we demonstrate how these sources of variation can generate inaccurate or misleading patterns.

We specifically highlight sample dilution without researcher awareness and lane‐to‐lane variability, two cryptic sources of variation arising during the MGAS pipeline. These sources of variation affect estimates of species presence and relative abundance, particularly for species with moderate to low abundances. Each of these sources of bias can lead to errors in the estimation of both absolute and relative abundance within, and turnover among, microbial communities.

Awareness and understanding of what happens and, specifically, why it happens during MGAS generation is key to generating a strong dataset and building a robust community matrix. Requesting sample dilution information from the sequencing centre, including technical replicates across sequencing lanes, and understanding how sampling intensity and community taxa distribution patterns shape the measurement of community richness, evenness and diversity are critical for drawing correct ecological inferences using MGAS data.

 
more » « less
Award ID(s):
2129332
NSF-PAR ID:
10446139
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
13
Issue:
2
ISSN:
2041-210X
Page Range / eLocation ID:
p. 266-277
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. 16S rRNA gene profiling (amplicon sequencing) is a popular technique for understanding host-associated and environmental microbial communities. Most protocols for sequencing amplicon libraries follow a standardized pipeline that can differ slightly depending on laboratory facility and user. Given that the same variable region of the 16S gene is targeted, it is generally accepted that sequencing output from differing protocols are comparable and this assumption underlies our ability to identify universal patterns in microbial dynamics through meta-analyses. However, discrepant results from a combined 16S rRNA gene dataset prepared by two labs whose protocols differed only in DNA polymerase and sequencing platform led us to scrutinize the outputs and challenge the idea of confidently combining them for standard microbiome analysis. Using technical replicates of reef-building coral samples from two species, Montipora aequituberculata and Porites lobata , we evaluated the consistency of alpha and beta diversity metrics between data resulting from these highly similar protocols. While we found minimal variation in alpha diversity between platform, significant differences were revealed with most beta diversity metrics, dependent on host species. These inconsistencies persisted following removal of low abundance taxa and when comparing across higher taxonomic levels, suggesting that bacterial community differences associated with sequencing protocol are likely to be context dependent and difficult to correct without extensive validation work. The results of this study encourage caution in the statistical comparison and interpretation of studies that combine rRNA gene sequence data from distinct protocols and point to a need for further work identifying mechanistic causes of these observed differences. 
    more » « less
  2. Abstract

    Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors.

     
    more » « less
  3. Abstract

    Estimating phenotypic distributions of populations and communities is central to many questions in ecology and evolution. These distributions can be characterized by their moments (mean, variance, skewness and kurtosis) or diversity metrics (e.g. functional richness). Typically, such moments and metrics are calculated using community‐weighted approaches (e.g. abundance‐weighted mean). We propose an alternative bootstrapping approach that allows flexibility in trait sampling and explicit incorporation of intraspecific variation, and show that this approach significantly improves estimation while allowing us to quantify uncertainty.

    We assess the performance of different approaches for estimating the moments of trait distributions across various sampling scenarios, taxa and datasets by comparing estimates derived from simulated samples with the true values calculated from full datasets. Simulations differ in sampling intensity (individuals per species), sampling biases (abundance, size), trait data source (local vs. global) and estimation method (two types of community‐weighting, two types of bootstrapping).

    We introduce thetraitstrapR package, which contains a modular and extensible set of bootstrapping and weighted‐averaging functions that use community composition and trait data to estimate the moments of community trait distributions with their uncertainty. Importantly, the first function in the workflow,trait_fill, allows the user to specify hierarchical structures (e.g. plot within site, experiment vs. control, species within genus) to assign trait values to each taxon in each community sample.

    Across all taxa, simulations and metrics, bootstrapping approaches were more accurate and less biased than community‐weighted approaches. With bootstrapping, a sample size of 9 or more measurements per species per trait generally included the true mean within the 95% CI. It reduced average percent errors by 26%–74% relative to community‐weighting. Random sampling across all species outperformed both size‐ and abundance‐biased sampling.

    Our results suggest randomly sampling ~9 individuals per sampling unit and species, covering all species in the community and analysing the data using nonparametric bootstrapping generally enable reliable inference on trait distributions, including the central moments, of communities. By providing better estimates of community trait distributions, bootstrapping approaches can improve our ability to link traits to both the processes that generate them and their effects on ecosystems.

     
    more » « less
  4. Abstract

    Understanding how abiotic disturbance and biotic interactions determine pollinator and flowering‐plant diversity is critically important given global climate change and widespread pollinator declines. To predict responses of pollinators and flowering‐plant communities to changes in wildfire disturbance, a mechanistic understanding of how these two trophic levels respond to wildfire severity is needed.

    We compared site‐to‐site variation in community composition (β‐diversity), species richness and abundances of pollinators and flowering plants among landscapes with no recent wildfire (unburned), mixed‐severity wildfire and high‐severity wildfire in three sites across the Northern Rockies Ecoregion, USA. We used variation partitioning to assess the relative contributions of wildfire, other abiotic variables (climate, soils and topography) and biotic associations among plant and pollinator composition to community assembly of both trophic levels.

    Wildfire disturbance generally increased species richness and total abundance, but decreasedβ‐diversity, of both pollinators and flowering plants. However, reductions inβ‐diversity from wildfire appeared to result from increased abundances following fires, resulting in higher local species richness of pollinators and flowers in burned than unburned landscapes. After accounting for differences in abundance, standardized effect sizes ofβ‐diversity were higher in burned than unburned landscapes, suggesting that wildfire enhances non‐random assortment of pollinator and flowering‐plant species among local communities.

    Wildfire disturbance mediated the relative importance of mutualistic associations toβ‐diversity of pollinators and flowering plants. The influence of pollinatorβ‐diversity on flowering‐plantβ‐diversity increased with wildfire severity, whereas the influence of flowering‐plantβ‐diversity on pollinatorβ‐diversity was greater in mixed‐severity than high‐severity wildfire or unburned landscapes. Moreover, biotic associations among pollinator and plant species explained substantial variation inβ‐diversity of both trophic levels beyond what could be explained by wildfire and all other abiotic and spatial factors combined.

    Synthesis. Wildfire disturbance and plant–pollinator interactions both strongly influenced the assembly of pollinator and flowering‐plant communities at local and regional scales. However, biotic interactions were generally more important drivers of community assembly in disturbed than undisturbed landscapes. As wildfire regimes continue to change globally, predicting its effects on biodiversity will require a deeper understanding of the ecological processes that mediate biotic interactions among linked trophic levels.

     
    more » « less
  5. Abstract

    The hypothesis that biotic interactions are stronger at lower relative to higher latitudes has a rich history, drawing from ecological and evolutionary theory. While this hypothesis suggests that stronger interactions at lower latitudes may contribute to the maintenance of contemporary patterns of diversity, there remain few standardized biogeographic comparisons of community effects of species interactions.

    Using marine seagrasses as a focal ecosystem of conservation importance and sessile marine invertebrates as model prey, we tested the hypothesis that predation is stronger at lower latitudes and can shape contemporary patterns of prey diversity. To further advance understanding beyond prior studies, we also explored mechanisms that likely underlie a change in interaction outcomes with latitude.

    Multiple observational and experimental approaches were employed to test for effects of predators, and the mechanisms that may underlie these effects, in seagrass ecosystems of the western Atlantic Ocean spanning 30° of latitude from the temperate zone to the tropics.

    In predator exclusion experiments conducted in a temperate and a tropical region, predation decreased sessile invertebrate abundance, richness and diversity on both natural and standardized artificial seagrass at tropical but not temperate sites. Further, predation reduced invertebrate richness at both local and regional scales in the tropics. Additional experiments demonstrated that predation reduced invertebrate recruitment in the tropics but not the temperate zone. Finally, direct observations of predators showed higher but variable consumption rates on invertebrates at tropical relative to temperate latitudes.

    Together, these results demonstrate that strong predation in the tropics can have consequential impacts on prey communities through discrete effects on early life stages as well as longer‐term cumulative effects on community structure and diversity. Our detailed experiments also provide some of the first data linking large‐scale biogeographic patterns, community‐scale interaction outcomes and direct observation of predators in the temperate zone and tropics. Therefore, our results support the hypothesis that predation is stronger in the tropics, but also elucidate some of the causes and consequences of this variation in shaping contemporary patterns of diversity.

     
    more » « less