skip to main content


This content will become publicly available on August 1, 2024

Title: Library preparation method and DNA source influence endogenous DNA recovery from 100‐year‐old avian museum specimens
Abstract Museum specimens collected prior to cryogenic tissue storage are increasingly being used as genetic resources, and though high‐throughput sequencing is becoming more cost‐efficient, whole genome sequencing (WGS) of historical DNA (hDNA) remains inefficient and costly due to its short fragment sizes and high loads of exogenous DNA, among other factors. It is also unclear how sequencing efficiency is influenced by DNA sources. We aimed to identify the most efficient method and DNA source for collecting WGS data from avian museum specimens. We analyzed low‐coverage WGS from 60 DNA libraries prepared from four American Robin ( Turdus migratorius ) and four Abyssinian Thrush ( Turdus abyssinicus ) specimens collected in the 1920s. We compared DNA source (toepad versus incision‐line skin clip) and three library preparation methods: (1) double‐stranded DNA (dsDNA), single tube (KAPA); (2) single‐stranded DNA (ssDNA), multi‐tube (IDT); and (3) ssDNA, single tube (Claret Bioscience). We found that the ssDNA, multi‐tube method resulted in significantly greater endogenous DNA content, average read length, and sequencing efficiency than the other tested methods. We also tested whether a predigestion step reduced exogenous DNA in libraries from one specimen per species and found promising results that warrant further study. The ~10% increase in average sequencing efficiency of the best‐performing method over a commonly implemented dsDNA library preparation method has the potential to significantly increase WGS coverage of hDNA from bird specimens. Future work should evaluate the threshold for specimen age at which these results hold and how the combination of library preparation method and DNA source influence WGS in other taxa.  more » « less
Award ID(s):
1953688 1953796
NSF-PAR ID:
10446720
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Ecology and Evolution
Volume:
13
Issue:
8
ISSN:
2045-7758
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Viruses strongly influence microbial population dynamics and ecosystem functions. However, our ability to quantitatively evaluate those viral impacts is limited to the few cultivated viruses and double-stranded DNA (dsDNA) viral genomes captured in quantitative viral metagenomes (viromes). This leaves the ecology of non-dsDNA viruses nearly unknown, including single-stranded DNA (ssDNA) viruses that have been frequently observed in viromes, but not quantified due to amplification biases in sequencing library preparations (Multiple Displacement Amplification, Linker Amplification or Tagmentation).

    Methods

    Here we designed mock viral communities including both ssDNA and dsDNA viruses to evaluate the capability of a sequencing library preparation approach including an Adaptase step prior to Linker Amplification for quantitative amplification of both dsDNA and ssDNA templates. We then surveyed aquatic samples to provide first estimates of the abundance of ssDNA viruses.

    Results

    Mock community experiments confirmed the biased nature of existing library preparation methods for ssDNA templates (either largely enriched or selected against) and showed that the protocol using Adaptase plus Linker Amplification yielded viromes that were ±1.8-fold quantitative for ssDNA and dsDNA viruses. Application of this protocol to community virus DNA from three freshwater and three marine samples revealed that ssDNA viruses as a whole represent only a minor fraction (<5%) of DNA virus communities, though individual ssDNA genomes, both eukaryote-infecting Circular Rep-Encoding Single-Stranded DNA (CRESS-DNA) viruses and bacteriophages from theMicroviridaefamily, can be among the most abundant viral genomes in a sample.

    Discussion

    Together these findings provide empirical data for a new virome library preparation protocol, and a first estimate of ssDNA virus abundance in aquatic systems.

     
    more » « less
  2. Next-generation sequencing (NGS) technologies have revolutionized phylogenomics by decreasing the cost and time required to generate sequence data from multiple markers or whole genomes. Further, the fragmented DNA of biological specimens collected decades ago can be sequenced with NGS, reducing the need for collecting fresh specimens. Sequence capture, also known as anchored hybrid enrichment, is a method to produce reduced representation libraries for NGS sequencing. The technique uses single-stranded oligonucleotide probes that hybridize with pre-selected regions of the genome that are sequenced via NGS, culminating in a dataset of numerous orthologous loci from multiple taxa. Phylogenetic analyses using these sequences have the potential to resolve deep and shallow phylogenetic relationships. Identifying the factors that affect sequence capture success could save time, money, and valuable specimens that might be destructively sampled despite low likelihood of sequencing success. We investigated the impacts of specimen age, preservation method, and DNA concentration on sequence capture (number of captured sequences and sequence quality) while accounting for taxonomy and extracted tissue type in a large-scale butterfly phylogenomics project. This project used two probe sets to extract 391 loci or a subset of 13 loci from over 6,000 butterfly specimens. We found that sequence capture is a resilient method capable of amplifying loci in samples of varying age (0–111 years), preservation method (alcohol, papered, pinned), and DNA concentration (0.020 ng/μl - 316 ng/ul). Regression analyses demonstrate that sequence capture is positively correlated with DNA concentration. However, sequence capture and DNA concentration are negatively correlated with sample age and preservation method. Our findings suggest that sequence capture projects should prioritize the use of alcohol-preserved samples younger than 20 years old when available. In the absence of such specimens, dried samples of any age can yield sequence data, albeit with returns that diminish with increasing age. 
    more » « less
  3. Abstract

    DNA sequencing technologies continue to advance the biological sciences, expanding opportunities for genomic studies of non‐model organisms for basic and applied questions. Despite these opportunities, many next generation sequencing protocols have been developed assuming a substantial quantity of high molecular weight DNA (>100 ng), which can be difficult to obtain for many study systems. In particular, the ability to sequence field‐collected specimens that exhibit varying levels of DNA degradation remains largely unexplored. In this study we investigate the influence of five traditional insect capture and curation methods on Double‐Digest Restriction Enzyme Associated DNA (ddRAD) sequencing success for three wild bee species. We sequenced a total of 105 specimens (between 7–13 specimens per species and treatment). We additionally investigated how different DNA quality metrics (including pre‐sequence concentration and contamination) predicted downstream sequencing success, and also compared two DNA extraction methods. We report successful library preparation for all specimens, with all treatments and extraction methods producing enough highly reliable loci for population genetic analyses. Although results varied between species, we found that specimens collected by net sampling directly into 100% EtOH, or by passive trapping followed by 100% EtOH storage before pinning tended to produce higher quality ddRAD assemblies, likely as a result of rapid specimen desiccation. Surprisingly, we found that specimens preserved in propylene glycol during field sampling exhibited lower‐quality assemblies. We provide recommendations for each treatment, extraction method, and DNA quality assessment, and further encourage researchers to consider utilizing a wider variety of specimens for genomic analyses.

     
    more » « less
  4. Abstract

    Despite advances that allowDNAsequencing of old museum specimens, sequencing small‐bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmentedDNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small‐bodied (3–6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58–159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of inputDNA(1–10 ng). We also explored low‐cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimalDNA, such as enzymatic repair ofDNA. We report successful sample preparation and sequencing for all historical specimens despite our low‐inputDNAapproach. We provide a list of guidelines related toDNArepair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuableDNAand enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens.

     
    more » « less
  5. Abstract

    Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction‐site‐associatedDNAsequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduceRADcap, an approach that combines the major benefits ofRADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches.RADcap uses a new version of dual‐digestRADseq (3RAD) to identify candidateSNPloci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidateSNPloci across many individuals. We combined this approach with a new library preparation method for identifying and removingPCRduplicates from 3RADlibraries, which allows researchers to processRADseq data using traditional pipelines, and we tested theRADcap method by genotyping sets of 96–384Wisteriaplants. Our results demonstrate that ourRADcap method: (i) methodologically reduces (to <5%) and allows computational removal ofPCRduplicate reads from data, (ii) achieves 80–90% reads on target in 11 of 12 enrichments, (iii) returns consistent coverage (≥4×) across >90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

     
    more » « less