Biodiversity genomics research requires reliable organismal identification, which can be difficult based on morphology alone. DNA-based identification using DNA barcoding can provide confirmation of species identity and resolve taxonomic issues but is rarely used in studies generating reference genomes. Here, we describe the development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland. We present a standardised framework for DNA barcode sequencing and data interpretation that is then adapted for diverse organismal groups. DNA barcoding data from over 12,000 DToL specimens has identified up to 20% of samples requiring additional verification, with 2% of seed plants and 3.5% of animal specimens subsequently having their names changed. We also make recommendations for future developments using new sequencing approaches and streamlined bioinformatic approaches. 
                        more » 
                        « less   
                    
                            
                            A CRISPR ‐based strategy for targeted sequencing in biodiversity science
                        
                    
    
            Abstract Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2046797
- PAR ID:
- 10482796
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Molecular Ecology Resources
- ISSN:
- 1755-098X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Dietary DNA metabarcoding enables researchers to identify and characterize trophic interactions with a high degree of taxonomic precision. It is also sensitive to sources of bias and contamination in the field and lab. One of the earliest and most common strategies for dealing with such sensitivities has been to filter resulting sequence data to remove low-abundance sequences before conducting ecological analyses based on the presence or absence of food taxa. Although this step is now often perceived to be both necessary and sufficient for cleaning up datasets, evidence to support this perception is lacking and more attention needs to be paid to the related risk of introducing other undesirable errors. Using computer simulations, we demonstrate that common strategies to remove low-abundance sequences can erroneously eliminate true dietary sequences in ways that impact downstream dietary inferences. Using real data from well-studied wildlife populations in Yellowstone National Park, we further show how these strategies can markedly alter the composition of individual dietary profiles in ways that scale-up to obscure ecological interpretations about dietary generalism, specialism, and niche partitioning. Although the practice of removing low-abundance sequences may continue to be a useful strategy to address a subset of research questions that focus on a subset of relatively abundant food resources, its continued widespread use risks generating misleading perceptions about the structure of trophic networks. Researchers working with dietary DNA metabarcoding data—or similar data such as environmental DNA, microbiomes, or pathobiomes—should be aware of potential drawbacks and consider alternative bioinformatic, experimental, and statistical solutions. We used fecal DNA metabarcoding to characterize the diets of bison and bighorn sheep in winter and summer. Our analyses are based on 35 samples (median per species per season = 10) analyzed using the P6 loop of the chloroplast trnL(UAA) intron together with publicly available plant reference data (Illumina sequence read data are available at NCBI (BioProject: PRJNA780500)). Obicut was used to trim reads with a minimum quality threshold of 30, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were ≤8 bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command. Overall, we characterized 357 plant sequences and a subset of 355 sequences were retained in the dataset after rarefying samples to equal sequencing depth. We then applied relative read abundance thresholds from 0% to 5% to the fecal samples. We compared differences in the inferred dietary richness within and between species based on individual samples, based on average richness across samples, and based on the total richness of each population after accounting for differences in sample size. The readme file contains an explanation of each of the variables in the dataset. Information on the methodology can be found in the associated manuscript referenced above.more » « less
- 
            Abstract Because of the detrimental effects of terrestrial invasive plant species (TIPS) on native species, ecosystems, public health, and the economy, many countries have been actively looking for strategies to prevent the introduction and minimize the spread of TIPS. Fast and accurate detection of TIPS is essential to achieving these goals. Conventionally, invasive species monitoring has relied on morphological attributes. Recently, DNA‐based species identification (i.e., DNA barcoding) has become more attractive. To investigate whether DNA barcoding can aid in the detection and management of TIPS, we visited multiple nature areas in Southwest Michigan and collected a small piece of leaf tissue from 91 representative terrestrial plant species, most of which are invasive. We extracted DNA from the leaf samples, amplified four genomic loci (ITS,rbcL,matK, andtrnH‐psbA) with PCR, and then purified and sequenced the PCR products. After careful examination of the sequencing data, we were able to identify reliable DNA barcode regions for most species and had an average PCR‐and‐sequencing success rate of 87.9%. We found that the species discrimination rate of a DNA barcode region is inversely related to the ease of PCR amplification and sequencing. Compared withrbcLandmatK, ITS andtrnH‐psbAhave better species discrimination rates (80.6% and 63.2%, respectively). When ITS andtrnH‐psbAare simultaneously used, the species discrimination rate increases to 97.1%. The high species/genus/family discrimination rates of DNA barcoding indicate that DNA barcoding can be successfully employed in TIPS identification. Further increases in the number of DNA barcode regions show little or no additional increases in the species discrimination rate, suggesting that dual‐barcode approaches (e.g., ITS + trnH‐psbA) might be the efficient and cost‐effective method in DNA‐based TIPS identification. Close inspection of nucleotide sequences at the four DNA barcode regions among related species demonstrates that DNA barcoding is especially useful in identifying TIPS that are morphologically similar to other species.more » « less
- 
            Abstract BackgroundModern plant breeding strategies rely on the intensive use of advanced genomic tools to expedite the development of improved crop varieties. Genomic DNA extraction from crop seeds eliminates the need to grow plants in contrast to fresh leaf tissue; however, it can still be a bottleneck due to the presence of stored compounds and the complexity of the matrix. The interaction of environmentally benign choline-based ionic liquids (ILs) with DNA offers an innovative approach to enhance the quality of extracted DNA from seeds. While prior IL-based plant DNA extraction workflows have primarily supported polymerase chain reaction (PCR) and quantitative PCR-based applications, their suitability for high-throughput sequencing (HTS) remained largely unexplored. This study explores the efficacy of IL-assisted method for genomic DNA extraction from soybean (Glycine max) seeds, addressing the limited application of ILs in HTS. ResultsThe optimized DNA extraction method, utilizing 25% (w/v) choline formate, enabled the recovery of high-purity DNA with abundant fragment sizes > 20 kb, suitable for downstream applications including PCR, whole genome amplification (WGA), simple sequence repeat (SSR) amplification, and high-throughput Illumina sequencing. The IL-method was benchmarked against a silica-binding method using cetyltrimethylammonium bromide (CTAB) and sodium dodecyl sulfate (SDS) as lysis agents using a commercial plant DNA extraction kit in terms of DNA yield, purity, abundant DNA fragment size distribution, and integrity. In addition, DNA isolated from this method demonstrated successful PCR amplification of markers from both the nuclear and plastid genomes and yielded > 99% whole genome coverage with Illumina (PE150) sequencing reads. ConclusionsThis is the first known instance of a whole genome sequence generated from DNA extracted with ILs. These findings mark a significant milestone in establishing ILs as promising alternatives to conventional methods for seed DNA extraction, with potential utility in third generation (long-read) sequencing experiments.more » « less
- 
            ABSTRACT True fungi (Fungi) and fungus-like organisms (e.g.Mycetozoa,Oomycota) constitute the second largest group of organisms based on global richness estimates, with around 3 million predicted species. Compared to plants and animals, fungi have simple body plans with often morphologically and ecologically obscure structures. This poses challenges for accurate and precise identifications. Here we provide a conceptual framework for the identification of fungi, encouraging the approach of integrative (polyphasic) taxonomy for species delimitation, i.e. the combination of genealogy (phylogeny), phenotype (including autecology), and reproductive biology (when feasible). This allows objective evaluation of diagnostic characters, either phenotypic or molecular or both. Verification of identifications is crucial but often neglected. Because of clade-specific evolutionary histories, there is currently no single tool for the identification of fungi, although DNA barcoding using the internal transcribed spacer (ITS) remains a first diagnosis, particularly in metabarcoding studies. Secondary DNA barcodes are increasingly implemented for groups where ITS does not provide sufficient precision. Issues of pairwise sequence similarity-based identifications and OTU clustering are discussed, and multiple sequence alignment-based phylogenetic approaches with subsequent verification are recommended as more accurate alternatives. In metabarcoding approaches, the trade-off between speed and accuracy and precision of molecular identifications must be carefully considered. Intragenomic variation of the ITS and other barcoding markers should be properly documented, as phylotype diversity is not necessarily a proxy of species richness. Important strategies to improve molecular identification of fungi are: (1) broadly document intraspecific and intragenomic variation of barcoding markers; (2) substantially expand sequence repositories, focusing on undersampled clades and missing taxa; (3) improve curation of sequence labels in primary repositories and substantially increase the number of sequences based on verified material; (4) link sequence data to digital information of voucher specimens including imagery. In parallel, technological improvements to genome sequencing offer promising alternatives to DNA barcoding in the future. Despite the prevalence of DNA-based fungal taxonomy, phenotype-based approaches remain an important strategy to catalog the global diversity of fungi and establish initial species hypotheses.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
