skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The precautionary principle and dietary DNA metabarcoding: commonly used abundance thresholds change ecological interpretation
Dietary DNA metabarcoding enables researchers to identify and characterize trophic interactions with a high degree of taxonomic precision. It is also sensitive to sources of bias and contamination in the field and lab. One of the earliest and most common strategies for dealing with such sensitivities has been to filter resulting sequence data to remove low-abundance sequences before conducting ecological analyses based on the presence or absence of food taxa. Although this step is now often perceived to be both necessary and sufficient for cleaning up datasets, evidence to support this perception is lacking and more attention needs to be paid to the related risk of introducing other undesirable errors. Using computer simulations, we demonstrate that common strategies to remove low-abundance sequences can erroneously eliminate true dietary sequences in ways that impact downstream dietary inferences. Using real data from well-studied wildlife populations in Yellowstone National Park, we further show how these strategies can markedly alter the composition of individual dietary profiles in ways that scale-up to obscure ecological interpretations about dietary generalism, specialism, and niche partitioning. Although the practice of removing low-abundance sequences may continue to be a useful strategy to address a subset of research questions that focus on a subset of relatively abundant food resources, its continued widespread use risks generating misleading perceptions about the structure of trophic networks. Researchers working with dietary DNA metabarcoding data—or similar data such as environmental DNA, microbiomes, or pathobiomes—should be aware of potential drawbacks and consider alternative bioinformatic, experimental, and statistical solutions. We used fecal DNA metabarcoding to characterize the diets of bison and bighorn sheep in winter and summer. Our analyses are based on 35 samples (median per species per season = 10) analyzed using the P6 loop of the chloroplast trnL(UAA) intron together with publicly available plant reference data (Illumina sequence read data are available at NCBI (BioProject: PRJNA780500)). Obicut was used to trim reads with a minimum quality threshold of 30, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were ≤8 bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command. Overall, we characterized 357 plant sequences and a subset of 355 sequences were retained in the dataset after rarefying samples to equal sequencing depth. We then applied relative read abundance thresholds from 0% to 5% to the fecal samples. We compared differences in the inferred dietary richness within and between species based on individual samples, based on average richness across samples, and based on the total richness of each population after accounting for differences in sample size. The readme file contains an explanation of each of the variables in the dataset. Information on the methodology can be found in the associated manuscript referenced above.   more » « less
Award ID(s):
2033823 2046797 2026294 1930820
PAR ID:
10314003
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Dryad
Date Published:
Edition / Version:
3
Subject(s) / Keyword(s):
FOS: Biological sciences
Format(s):
Medium: X Size: 124634 bytes
Size(s):
124634 bytes
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori. 
    more » « less
  2. Biodiversity monitoring based on DNA metabarcoding depends on primer performance. Here, we develop a new metabarcoding primer pair that targets a ~ 318 bp fragment of the 28S rRNA gene. We validate the primer pair in assessing sponges, a notoriously challenging group for coral reef metabarcoding studies, by using mock and natural complex reef communities to examine its performance in species detection, amplification efficiency, and quantitative potential. Mock community experiments revealed a high number of sponge species (n = 94) spanning a broad taxonomic scope (15 orders), limited taxon-specific primer biases (only a single species exceeded a two-fold deviation from the expected number of reads), and its suitability for quantitative metabarcoding – there was a significant relationship between read abundance and visual percent coverage of sponge taxa (R = 0.76). In the natural complex coral reef community experiments, commonly used COI metabarcoding primers detected only 30.9% of sponge species, while the new 28S primer increased detection to 79.4%. These new 28S primers detect a broader taxonomic array of species across phyla and classes within the complex cryptobiome of coral reef communities than the Leray-Geller COI primers. As biodiversity assessments using metabarcoding tools are increasingly being leveraged for environmental monitoring and guide policymaking, these new 28S rRNA primers can improve biodiversity assessments for complex ecological coral reef communities. 
    more » « less
  3. Individual animals should adjust diets according to food availability. We used DNA metabarcoding to construct individual-level dietary timeseries for elephants from two family groups in Kenya varying in habitat use, social position and reproductive status. We detected at least 367 dietary plant taxa, with up to 137 unique plant sequences in one fecal sample. Results matched well-established trends: elephants tended to eat more grass when it rained and other plants when dry. Nested within these switches from ‘grazing’ to ‘browsing’ strategies, dietary DNA revealed seasonal shifts in food richness, composition and overlap between individuals. Elephants of both families converged on relatively cohesive diets in dry seasons but varied in their maintenance of cohesion during wet seasons. Dietary cohesion throughout the timeseries of the subdominant ‘Artists’ family was stronger and more consistently positive compared to the dominant ‘Royals’ family. The greater degree of individuality within the dominant family's timeseries could reflect more divergent nutritional requirements associated with calf dependency and/or priority access to preferred habitats. Whereas theory predicts that individuals should specialize on different foods under resource scarcity, our data suggest family bonds may promote cohesion and foster the emergence of diverse feeding cultures reflecting links between social behaviour and nutrition. 
    more » « less
  4. Borenstein, Elhanan (Ed.)
    Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno , achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool). 
    more » « less
  5. Abstract Sex estimation of skeletons is fundamental to many archaeological studies. Currently, three approaches are available to estimate sex–osteology, genomics, or proteomics, but little is known about the relative reliability of these methods in applied settings. We present matching osteological, shotgun-genomic, and proteomic data to estimate the sex of 55 individuals, each with an independent radiocarbon date between 2,440 and 100 cal BP, from two ancestral Ohlone sites in Central California. Sex estimation was possible in 100% of this burial sample using proteomics, in 91% using genomics, and in 51% using osteology. Agreement between the methods was high, however conflicts did occur. Genomic sex estimates were 100% consistent with proteomic and osteological estimates when DNA reads were above 100,000 total sequences. However, more than half the samples had DNA read numbers below this threshold, producing high rates of conflict with osteological and proteomic data where nine out of twenty conditional DNA sex estimates conflicted with proteomics. While the DNA signal decreased by an order of magnitude in the older burial samples, there was no decrease in proteomic signal. We conclude that proteomics provides an important complement to osteological and shotgun-genomic sex estimation. 
    more » « less