skip to main content


Title: Strategies for reducing per‐sample costs in target capture sequencing for phylogenomics and population genomics in plants

The reduced cost of high‐throughput sequencing and the development of gene sets with wide phylogenetic applicability has led to the rise of sequence capture methods as a plausible platform for both phylogenomics and population genomics in plants. An important consideration in large targeted sequencing projects is the per‐sample cost, which can be inflated when using off‐the‐shelf kits or reagents not purchased in bulk. Here, we discuss methods to reduce per‐sample costs in high‐throughput targeted sequencing projects. We review the minimal equipment and consumable requirements for targeted sequencing while comparing several alternatives to reduce bulk costs inDNAextraction, library preparation, target enrichment, and sequencing. We consider how each of the workflow alterations may be affected byDNAquality (e.g., fresh vs. herbarium tissue), genome size, and the phylogenetic scale of the project. We provide a cost calculator for researchers considering targeted sequencing to use when designing projects, and identify challenges for future development of low‐cost sequencing in non‐model plant systems.

 
more » « less
Award ID(s):
1753800 1711391
NSF-PAR ID:
10455186
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
8
Issue:
4
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction‐site‐associatedDNAsequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduceRADcap, an approach that combines the major benefits ofRADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches.RADcap uses a new version of dual‐digestRADseq (3RAD) to identify candidateSNPloci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidateSNPloci across many individuals. We combined this approach with a new library preparation method for identifying and removingPCRduplicates from 3RADlibraries, which allows researchers to processRADseq data using traditional pipelines, and we tested theRADcap method by genotyping sets of 96–384Wisteriaplants. Our results demonstrate that ourRADcap method: (i) methodologically reduces (to <5%) and allows computational removal ofPCRduplicate reads from data, (ii) achieves 80–90% reads on target in 11 of 12 enrichments, (iii) returns consistent coverage (≥4×) across >90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

     
    more » « less
  2. Premise

    New sequencing technologies have facilitated genomic studies in green microalgae; however, extracting high‐qualityDNAis often a bottleneck for long‐read sequencing.

    Methods and Results

    Here, we present a low‐cost, highly transferrable method for the extraction of high‐molecular‐weight (HMW), high‐purityDNAfrom microalgae. We first determined the effect of sample preparation onDNAquality using three homogenization methods: manual grinding using a mini‐pestle, automatic grinding using a vortex adapter, and grinding in liquid nitrogen. We demonstrated the versatility of grinding in liquid nitrogen followed by a modified cetyltrimethylammonium bromide (CTAB) extraction across a suite of aquatic‐ and desert‐evolved algal taxa. Finally, we tested the protocol's robustness by doubling the input material to increase yield, producing per sample up to 20 μg of high‐purityDNAlonger than 21.2 kbp.

    Conclusions

    All homogenization methods producedDNAwithin acceptable parameters for purity, but only liquid nitrogen grinding resulted inHMW DNA. The optimization of cell lysis while minimizingDNAshearing is therefore crucial for the isolation ofDNAfor long‐read genomic sequencing because templateDNAlength strongly affects read output and length.

     
    more » « less
  3. Abstract

    Despite advances that allowDNAsequencing of old museum specimens, sequencing small‐bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmentedDNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small‐bodied (3–6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58–159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of inputDNA(1–10 ng). We also explored low‐cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimalDNA, such as enzymatic repair ofDNA. We report successful sample preparation and sequencing for all historical specimens despite our low‐inputDNAapproach. We provide a list of guidelines related toDNArepair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuableDNAand enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens.

     
    more » « less
  4. Premise

    The ability to sequence genome‐scale data from herbarium specimens would allow for the economical development of data sets with broad taxonomic and geographic sampling that would otherwise not be possible. Here, we evaluate the utility of a basic double‐digest restriction site–associatedDNAsequencing (ddRADseq) protocol usingDNAs from four genera extracted from both silica‐dried and herbarium tissue.

    Methods

    DNAs fromDraba,Boechera,Solidago, andIlexwere processed with a ddRADseq protocol. The effects ofDNAdegradation, taxon, and specimen age were assessed.

    Results

    Although taxon, preservation method, and specimen age affected data recovery, large phylogenetically informative data sets were obtained from the majority of samples.

    Discussion

    These results suggest that herbarium samples can be incorporated into ddRADseq project designs, and that specimen age can be used as a rapid on‐site guide for sample choice. The detailed protocol we provide will allow users to pursue herbarium‐based ddRADseq projects that minimize the expenses associated with fieldwork and sample evaluation.

     
    more » « less
  5. Abstract

    The development of high‐throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifyingSNPpanels that are informative for parentage analysis from restriction site‐associatedDNAsequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis acrossSNPpanels generated with or without the use of a reference genome, and betweenSNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome producedSNPpanels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across allSNPpanels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284SNPs for Mexican gray wolf and 142SNPs for bighorn sheep, indicating our pipeline can be used to developSNPgenotyping assays for parentage analysis with relatively small numbers of loci.

     
    more » « less