skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification
Genetic tools are increasingly used to identify and discriminate between species. One key transition in this process was the recognition of the potential of the ca 658bp fragment of the organelle cytochrome c oxidase I (COI) as a barcode region, which revolutionized animal bioidentification and lead, among others, to the instigation of the Barcode of Life Database (BOLD), containing currently barcodes from >7.9 million specimens. Following this discovery, suggestions for other organellar regions and markers, and the primers with which to amplify them, have been continuously proposed. Most recently, the field has taken the leap from PCR‐based generation of DNA references into shotgun sequencing‐based “genome skimming” alternatives, with the ultimate goal of assembling organellar reference genomes. Unfortunately, in genome skimming approaches, much of the nuclear genome (as much as 99% of the sequence data) is discarded, which is not only wasteful, but can also limit the power of discrimination at, or below, the species level. Here, we advocate that the full shotgun sequence data can be used to assign an identity (that we term for convenience its “DNA‐mark”) for both voucher and query samples, without requiring any computationally intensive pretreatment (e.g. assembly) of reads. We argue that if reference databases are populated with such “DNA‐marks,” it will enable future DNA‐based taxonomic identification to complement, or even replace PCR of barcodes with genome skimming, and we discuss how such methodology ultimately could enable identification to population, or even individual, level.  more » « less
Award ID(s):
1815485
PAR ID:
10166999
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Molecular Ecology
ISSN:
0962-1083
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Jia, Zhi-Yun (Ed.)
    Abstract We are far from knowing all species living on the planet. Understanding biodiversity is demanding and requires time and expertise. Most groups are understudied given problems of identifying and delimiting species. DNA barcoding emerged to overcome some of the difficulties in identifying species. Its limitations derive from incomplete taxonomic knowledge and the lack of comprehensive DNA barcode libraries for so many taxonomic groups. Here, we evaluate how useful barcoding is for identifying arthropods from highly diverse leaf litter communities in the southern Appalachian Mountains (USA). We used 3 reference databases and several automated classification methods on a data set including several arthropod groups. Acari, Araneae, Collembola, Coleoptera, Diptera, and Hymenoptera were well represented, showing different performances across methods and databases. Spiders performed the best, with correct identification rates to species and genus levels of ~50% across databases. Springtails performed poorly, no barcodes were identified to species or genus. Other groups showed poor to mediocre performance, from around 3% (mites) to 20% (beetles) correctly identified barcodes to species, but also with some false identifications. In general, BOLD-based identification offered the best identification results but, in all cases except spiders, performance is poor, with less than a fifth of specimens correctly identified to genus or species. Our results indicate that the soil arthropod fauna is still insufficiently documented, with many species unrepresented in DNA barcode libraries. More effort toward integrative taxonomic characterization is needed to complete our reference libraries before we can rely on DNA barcoding as a universally applicable identification method. 
    more » « less
  2. Abstract Because of the detrimental effects of terrestrial invasive plant species (TIPS) on native species, ecosystems, public health, and the economy, many countries have been actively looking for strategies to prevent the introduction and minimize the spread of TIPS. Fast and accurate detection of TIPS is essential to achieving these goals. Conventionally, invasive species monitoring has relied on morphological attributes. Recently, DNA‐based species identification (i.e., DNA barcoding) has become more attractive. To investigate whether DNA barcoding can aid in the detection and management of TIPS, we visited multiple nature areas in Southwest Michigan and collected a small piece of leaf tissue from 91 representative terrestrial plant species, most of which are invasive. We extracted DNA from the leaf samples, amplified four genomic loci (ITS,rbcL,matK, andtrnH‐psbA) with PCR, and then purified and sequenced the PCR products. After careful examination of the sequencing data, we were able to identify reliable DNA barcode regions for most species and had an average PCR‐and‐sequencing success rate of 87.9%. We found that the species discrimination rate of a DNA barcode region is inversely related to the ease of PCR amplification and sequencing. Compared withrbcLandmatK, ITS andtrnH‐psbAhave better species discrimination rates (80.6% and 63.2%, respectively). When ITS andtrnH‐psbAare simultaneously used, the species discrimination rate increases to 97.1%. The high species/genus/family discrimination rates of DNA barcoding indicate that DNA barcoding can be successfully employed in TIPS identification. Further increases in the number of DNA barcode regions show little or no additional increases in the species discrimination rate, suggesting that dual‐barcode approaches (e.g., ITS + trnH‐psbA) might be the efficient and cost‐effective method in DNA‐based TIPS identification. Close inspection of nucleotide sequences at the four DNA barcode regions among related species demonstrates that DNA barcoding is especially useful in identifying TIPS that are morphologically similar to other species. 
    more » « less
  3. Abstract Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori. 
    more » « less
  4. The blue crab Callinectes sapidus is one of the most widely studied marine crustaceans due to its high economic value and ecological significance. Despite extensive research on the blue crab in North America, many questions remain about the distribution and abundance of the species in the subtropics and tropics. In many places, C. sapidus is sympatric with morphologically similar Callinectes spp., which has implications for seafood mislabeling. To enable rapid identification of the species, we designed and tested two PCR-based assays targeting the 12S rRNA mitochondrial gene. The first assay discriminates C. sapidus from other Callinectes spp. via post-PCR restriction digestion (PCR-RFLP) and the second assay discriminates among multiple Callinectes spp. through High Resolution Melting (HRM) analysis and supervised machine learning analyses. A total of 58 DNA samples from five Callinectes spp. (validated via 12S gene sequencing) were used for assay testing. The PCR RFLP assay was 100% accurate identifying C. sapidus from other Callinectes spp. HRM analysis of amplicons showed good discrimination among species, with distinct clusters formed between species with higher sequence homology. Linear discriminant analysis (LDA) classification of HRM curves was quite successful given the small dataset available, producing ∼90–91% mean accuracy in classification over all species with 100-fold cross validation. Much of the error came from misclassifications between C. similis and C. danae, which are ∼99% similar in sequence for the amplicon; collapsing them into a single class increased overall classification success to 94%. Error also arose from C. bocourti classifications, which had a reference set containing only three samples. Classification accuracy of C. sapidus alone via HRM was 97.5%. Overall, these assays show great promise as rapid and inexpensive methods to identify Callinectes spp. and have application for both ecological research and seafood identification or labeling. 
    more » « less
  5. Biodiversity genomics research requires reliable organismal identification, which can be difficult based on morphology alone. DNA-based identification using DNA barcoding can provide confirmation of species identity and resolve taxonomic issues but is rarely used in studies generating reference genomes. Here, we describe the development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland. We present a standardised framework for DNA barcode sequencing and data interpretation that is then adapted for diverse organismal groups. DNA barcoding data from over 12,000 DToL specimens has identified up to 20% of samples requiring additional verification, with 2% of seed plants and 3.5% of animal specimens subsequently having their names changed. We also make recommendations for future developments using new sequencing approaches and streamlined bioinformatic approaches. 
    more » « less