skip to main content


Title: The precautionary principle and dietary DNA metabarcoding: commonly used abundance thresholds change ecological interpretation
Dietary DNA metabarcoding enables researchers to identify and characterize trophic interactions with a high degree of taxonomic precision. It is also sensitive to sources of bias and contamination in the field and lab. One of the earliest and most common strategies for dealing with such sensitivities has been to filter resulting sequence data to remove low-abundance sequences before conducting ecological analyses based on the presence or absence of food taxa. Although this step is now often perceived to be both necessary and sufficient for cleaning up datasets, evidence to support this perception is lacking and more attention needs to be paid to the related risk of introducing other undesirable errors. Using computer simulations, we demonstrate that common strategies to remove low-abundance sequences can erroneously eliminate true dietary sequences in ways that impact downstream dietary inferences. Using real data from well-studied wildlife populations in Yellowstone National Park, we further show how these strategies can markedly alter the composition of individual dietary profiles in ways that scale-up to obscure ecological interpretations about dietary generalism, specialism, and niche partitioning. Although the practice of removing low-abundance sequences may continue to be a useful strategy to address a subset of research questions that focus on a subset of relatively abundant food resources, its continued widespread use risks generating misleading perceptions about the structure of trophic networks. Researchers working with dietary DNA metabarcoding data—or similar data such as environmental DNA, microbiomes, or pathobiomes—should be aware of potential drawbacks and consider alternative bioinformatic, experimental, and statistical solutions. We used fecal DNA metabarcoding to characterize the diets of bison and bighorn sheep in winter and summer. Our analyses are based on 35 samples (median per species per season = 10) analyzed using the P6 loop of the chloroplast trnL(UAA) intron together with publicly available plant reference data (Illumina sequence read data are available at NCBI (BioProject: PRJNA780500)). Obicut was used to trim reads with a minimum quality threshold of 30, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were ≤8 bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command. Overall, we characterized 357 plant sequences and a subset of 355 sequences were retained in the dataset after rarefying samples to equal sequencing depth. We then applied relative read abundance thresholds from 0% to 5% to the fecal samples. We compared differences in the inferred dietary richness within and between species based on individual samples, based on average richness across samples, and based on the total richness of each population after accounting for differences in sample size. The readme file contains an explanation of each of the variables in the dataset. Information on the methodology can be found in the associated manuscript referenced above.   more » « less
Award ID(s):
2033823 2046797 2026294 1930820
NSF-PAR ID:
10314003
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Dryad
Date Published:
Edition / Version:
3
Subject(s) / Keyword(s):
["FOS: Biological sciences"]
Format(s):
Medium: X Size: 124634 bytes
Size(s):
["124634 bytes"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori.

     
    more » « less
  2. Abstract

    Although protocols exist for the recovery of ancient DNA from land snail and marine bivalve shells, marine conch shells have yet to be studied from a paleogenomic perspective. We first present reference assemblies for both a 623.7 Mbp nuclear genome and a 15.4 kbp mitochondrial genome forStrombus pugilis, the West Indian fighting conch. We next detail a method to extract and sequence DNA from conch shells and apply it to conch from Bocas del Toro, Panama across three time periods: recently‐eaten and discarded (n = 3), Late Holocene (984–1258 before present [BP]) archaeological midden (n = 5), and mid‐Holocene (5711–7187 BP) paleontological fossil coral reef (n = 5). These results are compared to control DNA extracted from live‐caught tissue and fresh shells (n = 5). Using high‐throughput sequencing, we were able to obtainS. pugilisnuclear sequence reads from shells across all age periods: up to 92.5 thousand filtered reads per sample in live‐caught shell material, 4.57 thousand for modern discarded shells, 12.1 thousand reads for archaeological shells, and 114 reads in paleontological shells. We confirmed authenticity of the ancient DNA recovered from the archaeological and paleontological shells based on 5.7× higher average frequency of deamination‐driven misincorporations and 15% shorter average read lengths compared to the modern shells. Reads also mapped to theS. pugilismitochondrial genome for all but the paleontological shells, with consistent ratios of mitochondrial to nuclear mapped reads across sample types. Our methods can be applied to diverse archaeological sites to facilitate reconstructions of the long‐term impacts of human behaviour on mollusc evolutionary biology.

     
    more » « less
  3. Abstract

    Many populations of consumers consist of relatively specialized individuals that eat only a subset of the foods consumed by the population at large. Although the ecological significance of individual‐level diet variation is recognized, such variation is difficult to document, and its underlying mechanisms are poorly understood. Optimal foraging theory provides a useful framework for predicting how individuals might select different diets, positing that animals balance the “opportunity cost” of stopping to eat an available food item against the cost of searching for something more nutritious; diet composition should be contingent on the distribution of food, and individual foragers should be more selective when they have greater energy reserves to invest in searching for high‐quality foods. We tested these predicted mechanisms of individual niche differentiation by quantifying environmental (resource heterogeneity) and organismal (nutritional condition) determinants of diet in a widespread browsing antelope (bushbuck,Tragelaphus sylvaticus) in an African floodplain‐savanna ecosystem. We quantified individuals' realized dietary niches (taxonomic richness and composition) using DNA metabarcoding of fecal samples collected repeatedly from 15 GPS‐collared animals (range 6–14 samples per individual, median 12). Bushbuck diets were structured by spatial heterogeneity and constrained by individual condition. We observed significant individual‐level partitioning of food plants by bushbuck both within and between two adjacent habitat types (floodplain and woodland). Individuals with home ranges that were closer together and/or had similar vegetation structure (measured using LiDAR) ate more similar diets, supporting the prediction that heterogeneous resource distribution promotes individual differentiation. Individuals in good nutritional condition had significantly narrower diets (fewer plant taxa), searched their home ranges more intensively (intensity‐of‐use index), and had higher‐quality diets (percent digestible protein) than those in poor condition, supporting the prediction that animals with greater endogenous reserves have narrower realized niches because they can invest more time in searching for nutritious foods. Our results support predictions from optimal foraging theory about the energetic basis of individual‐level dietary variation and provide a potentially generalizable framework for understanding how individuals' realized niche width is governed by animal behavior and physiology in heterogeneous landscapes.

     
    more » « less
  4. Individual animals should adjust diets according to food availability. We used DNA metabarcoding to construct individual-level dietary timeseries for elephants from two family groups in Kenya varying in habitat use, social position and reproductive status. We detected at least 367 dietary plant taxa, with up to 137 unique plant sequences in one fecal sample. Results matched well-established trends: elephants tended to eat more grass when it rained and other plants when dry. Nested within these switches from ‘grazing’ to ‘browsing’ strategies, dietary DNA revealed seasonal shifts in food richness, composition and overlap between individuals. Elephants of both families converged on relatively cohesive diets in dry seasons but varied in their maintenance of cohesion during wet seasons. Dietary cohesion throughout the timeseries of the subdominant ‘Artists’ family was stronger and more consistently positive compared to the dominant ‘Royals’ family. The greater degree of individuality within the dominant family's timeseries could reflect more divergent nutritional requirements associated with calf dependency and/or priority access to preferred habitats. Whereas theory predicts that individuals should specialize on different foods under resource scarcity, our data suggest family bonds may promote cohesion and foster the emergence of diverse feeding cultures reflecting links between social behaviour and nutrition. 
    more » « less
  5. Rationale

    Protein studies in archaeology and paleontology have been dominated by stable isotope studies to understand diet and trophic levels, but recent applications of proteomic techniques have resulted in a more complete understanding of protein diagenesis than stable isotopes alone. In stable isotope analyses, samples are retained or discarded based on their properties. Proteomics can directly determine what proteins are present within the sample and may be able to allow previously discarded samples to be analyzed.

    Methods

    Protein samples that had been previously analyzed for stable isotopes, including those with marginal and poor sample quality, were characterized by liquid chromatography/mass spectrometry using an LTQ Orbitrap Velos mass spectrometer after separation on a Dionex Ultimate 3000 LC system. Data were analyzed using MetaMorpheus and custom R scripts.

    Results

    We found a variety of proteins in addition to collagen, although collagen I was found in the majority of the samples (most samples >80%). We also found a positive correlation between total deamidation and wt% N, suggesting that deamidation may impact the overall nitrogen signal in bulk analyses. The amino acid profiles of samples, including those of marginal or poor stable isotope quality, reflect the expected collagen I percentages, allowing their use in single amino acid stable isotope analyses.

    Conclusions

    All the samples regardless of quality were found to have high concentrations of collagen I, making interpretations of dietary routing based on collagen I reasonably valid. The amino acid profiles on the marginal and poor samples reflect an expected collagen I profile and allow these samples to be recovered for single amino acid analyses.

     
    more » « less