Title: Predictors of sequence capture in a large-scale anchored phylogenomics project
Next-generation sequencing (NGS) technologies have revolutionized phylogenomics by decreasing the cost and time required to generate sequence data from multiple markers or whole genomes. Further, the fragmented DNA of biological specimens collected decades ago can be sequenced with NGS, reducing the need for collecting fresh specimens. Sequence capture, also known as anchored hybrid enrichment, is a method to produce reduced representation libraries for NGS sequencing. The technique uses single-stranded oligonucleotide probes that hybridize with pre-selected regions of the genome that are sequenced via NGS, culminating in a dataset of numerous orthologous loci from multiple taxa. Phylogenetic analyses using these sequences have the potential to resolve deep and shallow phylogenetic relationships. Identifying the factors that affect sequence capture success could save time, money, and valuable specimens that might be destructively sampled despite low likelihood of sequencing success. We investigated the impacts of specimen age, preservation method, and DNA concentration on sequence capture (number of captured sequences and sequence quality) while accounting for taxonomy and extracted tissue type in a large-scale butterfly phylogenomics project. This project used two probe sets to extract 391 loci or a subset of 13 loci from over 6,000 butterfly specimens. We found that sequence capture is a resilient method capable of amplifying loci in samples of varying age (0–111 years), preservation method (alcohol, papered, pinned), and DNA concentration (0.020 ng/μl - 316 ng/ul). Regression analyses demonstrate that sequence capture is positively correlated with DNA concentration. However, sequence capture and DNA concentration are negatively correlated with sample age and preservation method. Our findings suggest that sequence capture projects should prioritize the use of alcohol-preserved samples younger than 20 years old when available. In the absence of such specimens, dried samples of any age can yield sequence data, albeit with returns that diminish with increasing age.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Ecology and Evolution
Medium: X
Sponsoring Org:
National Science Foundation
  1. Abstract

    DNA sequencing technologies continue to advance the biological sciences, expanding opportunities for genomic studies of non‐model organisms for basic and applied questions. Despite these opportunities, many next generation sequencing protocols have been developed assuming a substantial quantity of high molecular weight DNA (>100 ng), which can be difficult to obtain for many study systems. In particular, the ability to sequence field‐collected specimens that exhibit varying levels of DNA degradation remains largely unexplored. In this study we investigate the influence of five traditional insect capture and curation methods on Double‐Digest Restriction Enzyme Associated DNA (ddRAD) sequencing success for three wild bee species. We sequenced a total of 105 specimens (between 7–13 specimens per species and treatment). We additionally investigated how different DNA quality metrics (including pre‐sequence concentration and contamination) predicted downstream sequencing success, and also compared two DNA extraction methods. We report successful library preparation for all specimens, with all treatments and extraction methods producing enough highly reliable loci for population genetic analyses. Although results varied between species, we found that specimens collected by net sampling directly into 100% EtOH, or by passive trapping followed by 100% EtOH storage before pinning tended to produce higher quality ddRAD assemblies, likely as a result of rapid specimen desiccation. Surprisingly, we found that specimens preserved in propylene glycol during field sampling exhibited lower‐quality assemblies. We provide recommendations for each treatment, extraction method, and DNA quality assessment, and further encourage researchers to consider utilizing a wider variety of specimens for genomic analyses.

  2. Over the past decade, museum genomics studies have focused on obtaining DNA of sufficient quality and quantity for sequencing from fluid-preserved natural history specimens, primarily to be used in systematic studies. While these studies have opened windows to evolutionary and biodiversity knowledge of many species worldwide, published works often focus on the success of these DNA sequencing efforts, which is undoubtedly less common than obtaining minimal or sometimes no DNA or unusable sequence data from specimens in natural history collections. Here, we attempt to obtain and sequence DNA extracts from 115 fresh and 41 degraded samples of homalopsid snakes, as well as from two degraded samples of a poorly known snake, Hydrablabes periops . Hydrablabes has been suggested to belong to at least two different families (Natricidae and Homalopsidae) and with no fresh tissues known to be available, intractable museum specimens currently provide the only opportunity to determine this snake’s taxonomic affinity. Although our aim was to generate a target-capture dataset for these samples, to be included in a broader phylogenetic study, results were less than ideal due to large amounts of missing data, especially using the same downstream methods as with standard, high-quality samples. However, rather than discount results entirely, we used mapping methods with references and pseudoreferences, along with phylogenetic analyses, to maximize any usable molecular data from our sequencing efforts, identify the taxonomic affinity of H. periops , and compare sequencing success between fresh and degraded tissue samples. This resulted in largely complete mitochondrial genomes for five specimens and hundreds to thousands of nuclear loci (ultra-conserved loci, anchored-hybrid enrichment loci, and a variety of loci frequently used in squamate phylogenetic studies) from fluid-preserved snakes, including a specimen of H. periops from the Field Museum of Natural History collection. We combined our H. periops data with previously published genomic and Sanger-sequenced datasets to confirm the familial designation of this taxon, reject previous taxonomic hypotheses, and make biogeographic inferences for Hydrablabes . A second H. periops specimen, despite being seemingly similar for initial raw sequencing results and after being put through the same protocols, resulted in little usable molecular data. We discuss the successes and failures of using different pipelines and methods to maximize the products from these data and provide expectations for others who are looking to use DNA sequencing efforts on specimens that likely have degraded DNA. Life Science Identifier ( Hydrablabes periops ) :pub:F2AA44 E2-D2EF-4747-972A-652C34C2C09D. 
  3. Abstract

    Natural history collections play a crucial role in biodiversity research, and museum specimens are increasingly being incorporated into modern genetics‐based studies. Sequence capture methods have proven incredibly useful for phylogenomics, providing the additional ability to sequence historical museum specimens with highly degraded DNA, which until recently have been deemed less valuable for genetic work. The successful sequencing of ultraconserved elements (UCEs) from historical museum specimens has been demonstrated on multiple tissue types including dried bird skins, formalin‐fixed squamates and pinned insects. However, no study has thoroughly demonstrated this approach for historical ethanol‐preserved museum specimens. Alongside sequencing of “fresh” specimens preserved in >95% ethanol and stored at −80°C, we used extraction techniques specifically designed for degraded DNA coupled with sequence capture protocols to sequence UCEs from historical museum specimens preserved in 70%–80% ethanol and stored at room temperature, the standard for such ethanol‐preserved museum collections. Across 35 fresh and 15 historical museum samples of the arachnid order Opiliones, an average of 345 UCE loci were included in phylogenomic matrices, with museum samples ranging from six to 495 loci. We successfully demonstrate the inclusion of historical ethanol‐preserved museum specimens in modern sequence capture phylogenomic studies, show a high frequency of variant bases at the species and population levels, and from off‐target reads successfully recover multiple loci traditionally sequenced in multilocus studies including mitochondrial loci and nuclear rRNA loci. The methods detailed in this study will allow researchers to potentially acquire genetic data from millions of ethanol‐preserved museum specimens held in collections worldwide.

  4. Abstract

    The use of gDNAs isolated from museum specimens for high throughput sequencing, especially targeted sequencing in the context of phylogenetics, is a common practice. Yet, little understanding has been focused on comparing the quality of DNA and results of sequencing museum DNAs. Dragonflies and damselflies are ubiquitous in freshwater ecosystems and are commonly collected and preserved insects in museum collections hence their use in this study. However, the history of odonate preservation across time and museums has resulted in wide variability in the success of viable DNA extraction, necessitating an assessment of their usefulness in genetic studies. Using Anchored Hybrid Enrichment probes, we sequenced DNA from samples at 2 museums, 48 from the American Museum of Natural History (AMNH) in NYC, USA and 46 from the Naturalis Biodiversity Center (RMNH) in Leiden, Netherlands ranging from global collection localities and across a 120-year time span. We recovered at least 4 loci out of an >1,000 locus probe set for all samples, with the average capture being ~385 loci (539 loci on average when a clade of ambiguous taxa omitted). Neither specimen age nor size was a good predictor of locus capture, but recapture rates differed significantly between museums. Samples from the AMNH had lower overall locus capture than the RMNH, perhaps due to differences in specimen storage over time.

  5. Charleston, Michael (Ed.)
    Abstract We present a 517-gene phylogenetic framework for the breadfruit genus Artocarpus (ca. 70 spp., Moraceae), making use of silica-dried leaves from recent fieldwork and herbarium specimens (some up to 106 years old) to achieve 96% taxon sampling. We explore issues relating to assembly, paralogous loci, partitions, and analysis method to reconstruct a phylogeny that is robust to variation in data and available tools. Although codon partitioning did not result in any substantial topological differences, the inclusion of flanking noncoding sequence in analyses significantly increased the resolution of gene trees. We also found that increasing the size of data sets increased convergence between analysis methods but did not reduce gene-tree conflict. We optimized the HybPiper targeted-enrichment sequence assembly pipeline for short sequences derived from degraded DNA extracted from museum specimens. Although the subgenera of Artocarpus were monophyletic, revision is required at finer scales, particularly with respect to widespread species. We expect our results to provide a basis for further studies in Artocarpus and provide guidelines for future analyses of data sets based on target enrichment data, particularly those using sequences from both fresh and museum material, counseling careful attention to the potential of off-target sequences to improve resolution. [Artocarpus; Moraceae; noncoding sequences; phylogenomics; target enrichment.] 
