skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A CRISPR ‐based strategy for targeted sequencing in biodiversity science
Abstract Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori.  more » « less
Award ID(s):
2046797
PAR ID:
10482796
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
ISSN:
1755-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Biodiversity genomics research requires reliable organismal identification, which can be difficult based on morphology alone. DNA-based identification using DNA barcoding can provide confirmation of species identity and resolve taxonomic issues but is rarely used in studies generating reference genomes. Here, we describe the development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland. We present a standardised framework for DNA barcode sequencing and data interpretation that is then adapted for diverse organismal groups. DNA barcoding data from over 12,000 DToL specimens has identified up to 20% of samples requiring additional verification, with 2% of seed plants and 3.5% of animal specimens subsequently having their names changed. We also make recommendations for future developments using new sequencing approaches and streamlined bioinformatic approaches. 
    more » « less
  2. Dietary DNA metabarcoding enables researchers to identify and characterize trophic interactions with a high degree of taxonomic precision. It is also sensitive to sources of bias and contamination in the field and lab. One of the earliest and most common strategies for dealing with such sensitivities has been to filter resulting sequence data to remove low-abundance sequences before conducting ecological analyses based on the presence or absence of food taxa. Although this step is now often perceived to be both necessary and sufficient for cleaning up datasets, evidence to support this perception is lacking and more attention needs to be paid to the related risk of introducing other undesirable errors. Using computer simulations, we demonstrate that common strategies to remove low-abundance sequences can erroneously eliminate true dietary sequences in ways that impact downstream dietary inferences. Using real data from well-studied wildlife populations in Yellowstone National Park, we further show how these strategies can markedly alter the composition of individual dietary profiles in ways that scale-up to obscure ecological interpretations about dietary generalism, specialism, and niche partitioning. Although the practice of removing low-abundance sequences may continue to be a useful strategy to address a subset of research questions that focus on a subset of relatively abundant food resources, its continued widespread use risks generating misleading perceptions about the structure of trophic networks. Researchers working with dietary DNA metabarcoding data—or similar data such as environmental DNA, microbiomes, or pathobiomes—should be aware of potential drawbacks and consider alternative bioinformatic, experimental, and statistical solutions. We used fecal DNA metabarcoding to characterize the diets of bison and bighorn sheep in winter and summer. Our analyses are based on 35 samples (median per species per season = 10) analyzed using the P6 loop of the chloroplast trnL(UAA) intron together with publicly available plant reference data (Illumina sequence read data are available at NCBI (BioProject: PRJNA780500)). Obicut was used to trim reads with a minimum quality threshold of 30, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were ≤8 bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command. Overall, we characterized 357 plant sequences and a subset of 355 sequences were retained in the dataset after rarefying samples to equal sequencing depth. We then applied relative read abundance thresholds from 0% to 5% to the fecal samples. We compared differences in the inferred dietary richness within and between species based on individual samples, based on average richness across samples, and based on the total richness of each population after accounting for differences in sample size. The readme file contains an explanation of each of the variables in the dataset. Information on the methodology can be found in the associated manuscript referenced above.  
    more » « less
  3. Abstract Because of the detrimental effects of terrestrial invasive plant species (TIPS) on native species, ecosystems, public health, and the economy, many countries have been actively looking for strategies to prevent the introduction and minimize the spread of TIPS. Fast and accurate detection of TIPS is essential to achieving these goals. Conventionally, invasive species monitoring has relied on morphological attributes. Recently, DNA‐based species identification (i.e., DNA barcoding) has become more attractive. To investigate whether DNA barcoding can aid in the detection and management of TIPS, we visited multiple nature areas in Southwest Michigan and collected a small piece of leaf tissue from 91 representative terrestrial plant species, most of which are invasive. We extracted DNA from the leaf samples, amplified four genomic loci (ITS,rbcL,matK, andtrnH‐psbA) with PCR, and then purified and sequenced the PCR products. After careful examination of the sequencing data, we were able to identify reliable DNA barcode regions for most species and had an average PCR‐and‐sequencing success rate of 87.9%. We found that the species discrimination rate of a DNA barcode region is inversely related to the ease of PCR amplification and sequencing. Compared withrbcLandmatK, ITS andtrnH‐psbAhave better species discrimination rates (80.6% and 63.2%, respectively). When ITS andtrnH‐psbAare simultaneously used, the species discrimination rate increases to 97.1%. The high species/genus/family discrimination rates of DNA barcoding indicate that DNA barcoding can be successfully employed in TIPS identification. Further increases in the number of DNA barcode regions show little or no additional increases in the species discrimination rate, suggesting that dual‐barcode approaches (e.g., ITS + trnH‐psbA) might be the efficient and cost‐effective method in DNA‐based TIPS identification. Close inspection of nucleotide sequences at the four DNA barcode regions among related species demonstrates that DNA barcoding is especially useful in identifying TIPS that are morphologically similar to other species. 
    more » « less
  4. Abstract BackgroundModern plant breeding strategies rely on the intensive use of advanced genomic tools to expedite the development of improved crop varieties. Genomic DNA extraction from crop seeds eliminates the need to grow plants in contrast to fresh leaf tissue; however, it can still be a bottleneck due to the presence of stored compounds and the complexity of the matrix. The interaction of environmentally benign choline-based ionic liquids (ILs) with DNA offers an innovative approach to enhance the quality of extracted DNA from seeds. While prior IL-based plant DNA extraction workflows have primarily supported polymerase chain reaction (PCR) and quantitative PCR-based applications, their suitability for high-throughput sequencing (HTS) remained largely unexplored. This study explores the efficacy of IL-assisted method for genomic DNA extraction from soybean (Glycine max) seeds, addressing the limited application of ILs in HTS. ResultsThe optimized DNA extraction method, utilizing 25% (w/v) choline formate, enabled the recovery of high-purity DNA with abundant fragment sizes > 20 kb, suitable for downstream applications including PCR, whole genome amplification (WGA), simple sequence repeat (SSR) amplification, and high-throughput Illumina sequencing. The IL-method was benchmarked against a silica-binding method using cetyltrimethylammonium bromide (CTAB) and sodium dodecyl sulfate (SDS) as lysis agents using a commercial plant DNA extraction kit in terms of DNA yield, purity, abundant DNA fragment size distribution, and integrity. In addition, DNA isolated from this method demonstrated successful PCR amplification of markers from both the nuclear and plastid genomes and yielded > 99% whole genome coverage with Illumina (PE150) sequencing reads. ConclusionsThis is the first known instance of a whole genome sequence generated from DNA extracted with ILs. These findings mark a significant milestone in establishing ILs as promising alternatives to conventional methods for seed DNA extraction, with potential utility in third generation (long-read) sequencing experiments. 
    more » « less
  5. ABSTRACT DNA metabarcoding of zooplankton biodiversity is used increasingly for monitoring global ocean ecosystems, requiring comparable data from different research laboratories and ocean regions. The MetaZooGene Intercalibration Experiment (MZG‐ICE) was designed to examine1 and analyse patterns of variation of DNA sequence data resulting from multi‐gene metabarcoding of 10 zooplankton samples carried out by 10 research groups affiliated with the Scientific Committee for Ocean Research (SCOR). Aliquots of DNA extracted from the 10 zooplankton samples were distributed to MZG‐ICE groups for metabarcoding of four gene regions: V1‐V2, V4 and V9 of nuclear 18S rRNA and mitochondrial COI. Molecular protocols and procedures were recommended; substitutions were allowed as necessary. Resulting data were uploaded to a common repository for centralised statistics and bioinformatics. Based on proportional sequence numbers for abundant phyla, overall patterns of variation were consistent across many—but not all—MZG‐ICE groups. V9 showed highest similarity, followed (in order) by V4, V1‐V2, and COI. Outlier data were hypothesised to result from the use of different PCR protocols and sequencing platforms, and possible contamination. MZG‐ICE results indicated that DNA metabarcoding data from different laboratories and research groups can provide reliable, accurate and valid descriptions of biodiversity of zooplankton throughout the ocean. Recommendations included: pre‐screening QA/QC of raw data, detailed records for laboratory protocols, reagents, and instrumentation, and centralised bioinformatics and multivariate statistics. In the absence of universal agreement on standardised protocols or best practices, intercalibration is the best way forward toward validation of DNA metabarcoding of zooplankton diversity for global ocean monitoring. 
    more » « less