skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification
Abstract Genetic tools are increasingly used to identify and discriminate between species. One key transition in this process was the recognition of the potential of the ca 658bp fragment of the organelle cytochrome c oxidase I (COI) as a barcode region, which revolutionized animal bioidentification and lead, among others, to the instigation of the Barcode of Life Database (BOLD), containing currently barcodes from >7.9 million specimens. Following this discovery, suggestions for other organellar regions and markers, and the primers with which to amplify them, have been continuously proposed. Most recently, the field has taken the leap from PCR‐based generation of DNA references into shotgun sequencing‐based “genome skimming” alternatives, with the ultimate goal of assembling organellar reference genomes. Unfortunately, in genome skimming approaches, much of the nuclear genome (as much as 99% of the sequence data) is discarded, which is not only wasteful, but can also limit the power of discrimination at, or below, the species level. Here, we advocate that the full shotgun sequence data can be used to assign an identity (that we term for convenience its “DNA‐mark”) for both voucher and query samples, without requiring any computationally intensive pretreatment (e.g. assembly) of reads. We argue that if reference databases are populated with such “DNA‐marks,” it will enable future DNA‐based taxonomic identification to complement, or even replace PCR of barcodes with genome skimming, and we discuss how such methodology ultimately could enable identification to population, or even individual, level.  more » « less
Award ID(s):
1815485
PAR ID:
10456209
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology
Volume:
29
Issue:
14
ISSN:
0962-1083
Page Range / eLocation ID:
p. 2521-2534
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We are far from knowing all species living on the planet. Understanding biodiversity is demanding and requires time and expertise. Most groups are understudied given problems of identifying and delimiting species. DNA barcoding emerged to overcome some of the difficulties in identifying species. Its limitations derive from incomplete taxonomic knowledge and the lack of comprehensive DNA barcode libraries for so many taxonomic groups. Here, we evaluate how useful barcoding is for identifying arthropods from highly diverse leaf litter communities in the southern Appalachian Mountains (USA). We used 3 reference databases and several automated classification methods on a data set including several arthropod groups. Acari, Araneae, Collembola, Coleoptera, Diptera, and Hymenoptera were well represented, showing different performances across methods and databases. Spiders performed the best, with correct identification rates to species and genus levels of ~50% across databases. Springtails performed poorly, no barcodes were identified to species or genus. Other groups showed poor to mediocre performance, from around 3% (mites) to 20% (beetles) correctly identified barcodes to species, but also with some false identifications. In general, BOLD-based identification offered the best identification results but, in all cases except spiders, performance is poor, with less than a fifth of specimens correctly identified to genus or species. Our results indicate that the soil arthropod fauna is still insufficiently documented, with many species unrepresented in DNA barcode libraries. More effort toward integrative taxonomic characterization is needed to complete our reference libraries before we can rely on DNA barcoding as a universally applicable identification method. 
    more » « less
  2. Abstract Because of the detrimental effects of terrestrial invasive plant species (TIPS) on native species, ecosystems, public health, and the economy, many countries have been actively looking for strategies to prevent the introduction and minimize the spread of TIPS. Fast and accurate detection of TIPS is essential to achieving these goals. Conventionally, invasive species monitoring has relied on morphological attributes. Recently, DNA‐based species identification (i.e., DNA barcoding) has become more attractive. To investigate whether DNA barcoding can aid in the detection and management of TIPS, we visited multiple nature areas in Southwest Michigan and collected a small piece of leaf tissue from 91 representative terrestrial plant species, most of which are invasive. We extracted DNA from the leaf samples, amplified four genomic loci (ITS,rbcL,matK, andtrnH‐psbA) with PCR, and then purified and sequenced the PCR products. After careful examination of the sequencing data, we were able to identify reliable DNA barcode regions for most species and had an average PCR‐and‐sequencing success rate of 87.9%. We found that the species discrimination rate of a DNA barcode region is inversely related to the ease of PCR amplification and sequencing. Compared withrbcLandmatK, ITS andtrnH‐psbAhave better species discrimination rates (80.6% and 63.2%, respectively). When ITS andtrnH‐psbAare simultaneously used, the species discrimination rate increases to 97.1%. The high species/genus/family discrimination rates of DNA barcoding indicate that DNA barcoding can be successfully employed in TIPS identification. Further increases in the number of DNA barcode regions show little or no additional increases in the species discrimination rate, suggesting that dual‐barcode approaches (e.g., ITS + trnH‐psbA) might be the efficient and cost‐effective method in DNA‐based TIPS identification. Close inspection of nucleotide sequences at the four DNA barcode regions among related species demonstrates that DNA barcoding is especially useful in identifying TIPS that are morphologically similar to other species. 
    more » « less
  3. Abstract Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori. 
    more » « less
  4. Abstract The purpose of this study is to determine which taxonomic methods can elucidate clear and quantifiable differences between two cryptic ciliate species, and to test the utility of genome architecture as a new diagnostic character in the discrimination of otherwise indistinguishable taxa. Two cryptic tintinnid ciliates,Schmidingerella arcuataandSchmidingerella meunieri, are compared via traditional taxonomic characters including lorica morphometrics, ribosomal RNA (rRNA) gene barcodes and ecophysiological traits. In addition, single‐cell ‘omics analyses (single‐cell transcriptomics and genomics) are used to elucidate and compare patterns of micronuclear genome architecture between the congeners. The results include a highly similar lorica that is larger inS. meunieri, a 0%–0.5% difference in rRNA gene barcodes, two different and nine indistinguishable growth responses among 11 prey treatments, and distinct patterns of micronuclear genomic architecture for genes detected in both ciliates. Together, these results indicate that while minor differences exist betweenS. arcuataandS. meunieriin common indices of taxonomic identification (i.e., lorica morphology, DNA barcode sequences and ecophysiology), differences exist in their genomic architecture, which suggests potential genetic incompatibility. Different patterns of micronuclear architecture in genes shared by both isolates also enable the design of species‐specific primers, which are used in this study as unique “architectural barcodes” to demonstrate the co‐occurrence of both ciliates in samples collected from a NW Atlantic estuary. These results support the utility of genomic architecture as a tool in species delineation, especially in ciliates that are cryptic or otherwise difficult to differentiate using traditional methods of identification. 
    more » « less
  5. Biodiversity genomics research requires reliable organismal identification, which can be difficult based on morphology alone. DNA-based identification using DNA barcoding can provide confirmation of species identity and resolve taxonomic issues but is rarely used in studies generating reference genomes. Here, we describe the development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland. We present a standardised framework for DNA barcode sequencing and data interpretation that is then adapted for diverse organismal groups. DNA barcoding data from over 12,000 DToL specimens has identified up to 20% of samples requiring additional verification, with 2% of seed plants and 3.5% of animal specimens subsequently having their names changed. We also make recommendations for future developments using new sequencing approaches and streamlined bioinformatic approaches. 
    more » « less