skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predictors of sequence capture in a large-scale anchored phylogenomics project
Next-generation sequencing (NGS) technologies have revolutionized phylogenomics by decreasing the cost and time required to generate sequence data from multiple markers or whole genomes. Further, the fragmented DNA of biological specimens collected decades ago can be sequenced with NGS, reducing the need for collecting fresh specimens. Sequence capture, also known as anchored hybrid enrichment, is a method to produce reduced representation libraries for NGS sequencing. The technique uses single-stranded oligonucleotide probes that hybridize with pre-selected regions of the genome that are sequenced via NGS, culminating in a dataset of numerous orthologous loci from multiple taxa. Phylogenetic analyses using these sequences have the potential to resolve deep and shallow phylogenetic relationships. Identifying the factors that affect sequence capture success could save time, money, and valuable specimens that might be destructively sampled despite low likelihood of sequencing success. We investigated the impacts of specimen age, preservation method, and DNA concentration on sequence capture (number of captured sequences and sequence quality) while accounting for taxonomy and extracted tissue type in a large-scale butterfly phylogenomics project. This project used two probe sets to extract 391 loci or a subset of 13 loci from over 6,000 butterfly specimens. We found that sequence capture is a resilient method capable of amplifying loci in samples of varying age (0–111 years), preservation method (alcohol, papered, pinned), and DNA concentration (0.020 ng/μl - 316 ng/ul). Regression analyses demonstrate that sequence capture is positively correlated with DNA concentration. However, sequence capture and DNA concentration are negatively correlated with sample age and preservation method. Our findings suggest that sequence capture projects should prioritize the use of alcohol-preserved samples younger than 20 years old when available. In the absence of such specimens, dried samples of any age can yield sequence data, albeit with returns that diminish with increasing age.  more » « less
Award ID(s):
1920895
PAR ID:
10447834
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Ecology and Evolution
Volume:
10
ISSN:
2296-701X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Over the past decade, museum genomics studies have focused on obtaining DNA of sufficient quality and quantity for sequencing from fluid-preserved natural history specimens, primarily to be used in systematic studies. While these studies have opened windows to evolutionary and biodiversity knowledge of many species worldwide, published works often focus on the success of these DNA sequencing efforts, which is undoubtedly less common than obtaining minimal or sometimes no DNA or unusable sequence data from specimens in natural history collections. Here, we attempt to obtain and sequence DNA extracts from 115 fresh and 41 degraded samples of homalopsid snakes, as well as from two degraded samples of a poorly known snake, Hydrablabes periops . Hydrablabes has been suggested to belong to at least two different families (Natricidae and Homalopsidae) and with no fresh tissues known to be available, intractable museum specimens currently provide the only opportunity to determine this snake’s taxonomic affinity. Although our aim was to generate a target-capture dataset for these samples, to be included in a broader phylogenetic study, results were less than ideal due to large amounts of missing data, especially using the same downstream methods as with standard, high-quality samples. However, rather than discount results entirely, we used mapping methods with references and pseudoreferences, along with phylogenetic analyses, to maximize any usable molecular data from our sequencing efforts, identify the taxonomic affinity of H. periops , and compare sequencing success between fresh and degraded tissue samples. This resulted in largely complete mitochondrial genomes for five specimens and hundreds to thousands of nuclear loci (ultra-conserved loci, anchored-hybrid enrichment loci, and a variety of loci frequently used in squamate phylogenetic studies) from fluid-preserved snakes, including a specimen of H. periops from the Field Museum of Natural History collection. We combined our H. periops data with previously published genomic and Sanger-sequenced datasets to confirm the familial designation of this taxon, reject previous taxonomic hypotheses, and make biogeographic inferences for Hydrablabes . A second H. periops specimen, despite being seemingly similar for initial raw sequencing results and after being put through the same protocols, resulted in little usable molecular data. We discuss the successes and failures of using different pipelines and methods to maximize the products from these data and provide expectations for others who are looking to use DNA sequencing efforts on specimens that likely have degraded DNA. Life Science Identifier ( Hydrablabes periops ) urn:lsid:zoobank.org :pub:F2AA44 E2-D2EF-4747-972A-652C34C2C09D. 
    more » « less
  2. Abstract The use of gDNAs isolated from museum specimens for high throughput sequencing, especially targeted sequencing in the context of phylogenetics, is a common practice. Yet, little understanding has been focused on comparing the quality of DNA and results of sequencing museum DNAs. Dragonflies and damselflies are ubiquitous in freshwater ecosystems and are commonly collected and preserved insects in museum collections hence their use in this study. However, the history of odonate preservation across time and museums has resulted in wide variability in the success of viable DNA extraction, necessitating an assessment of their usefulness in genetic studies. Using Anchored Hybrid Enrichment probes, we sequenced DNA from samples at 2 museums, 48 from the American Museum of Natural History (AMNH) in NYC, USA and 46 from the Naturalis Biodiversity Center (RMNH) in Leiden, Netherlands ranging from global collection localities and across a 120-year time span. We recovered at least 4 loci out of an >1,000 locus probe set for all samples, with the average capture being ~385 loci (539 loci on average when a clade of ambiguous taxa omitted). Neither specimen age nor size was a good predictor of locus capture, but recapture rates differed significantly between museums. Samples from the AMNH had lower overall locus capture than the RMNH, perhaps due to differences in specimen storage over time. 
    more » « less
  3. Phylogenetic datasets are now commonly generated using short-read sequencing technologies unhampered by degraded DNA, such as that often extracted from herbarium specimens. The compatibility of these methods with herbarium specimens has precipitated an increase in broad sampling of herbarium specimens for inclusion in phylogenetic studies. Understanding which sample characteristics are predictive of sequencing success can guide researchers in the selection of tissues and specimens most likely to yield good results. Multiple recent studies have considered the relationship between sample characteristics and DNA yield and sequence capture success. Here we report an analysis of the relationship between sample characteristics and sequencing success for nearly 8,000 herbarium specimens. This study, the largest of its kind, is also the first to include a measure of specimen quality (“greenness”) as a predictor of DNA sequencing success. We found that taxonomic group and source herbarium are strong predictors of both DNA yield and sequencing success and that the most important specimen characteristics for predicting success differ for DNA yield and sequencing: greenness was the strongest predictor of DNA yield, and age was the strongest predictor of proportion-on-target reads recovered. Surprisingly, the relationship between age and proportion-on-target reads is the inverse of expectations; older specimens performed slightly better in our capture-based protocols. We also found that DNA yield itself is not a strong predictor of sequencing success. Most literature on DNA sequencing from herbarium specimens considers specimen selection for optimal DNA extraction success, which we find to be an inappropriate metric for predicting success using next-generation sequencing technologies. 
    more » « less
  4. Abstract Multi‐locus sequence data are widely used in fungal systematic and taxonomic studies to delimit species and infer evolutionary relationships. We developed and assessed the efficacy of a multi‐locus pooled sequencing method using PacBio long‐read high‐throughput sequencing. Samples included fresh and dried voucher specimens, cultures and archival DNA extracts of Agaricomycetes with an emphasis on the order Cantharellales. Of the 283 specimens sequenced, 93.6% successfully amplified at one or more loci with a mean of 3.3 loci amplified. Our method recovered multiple sequence variants representing alleles of rDNA loci and single copy protein‐coding genesrpb1,rpb2 andtef1. Within‐sample genetic variation differed by locus and taxonomic group, with the greatest genetic divergence observed among sequence variants ofrpb2 andtef1 from corticioid Cantharellales. Our method is a cost‐effective approach for generating accurate multi‐locus sequence data coupled with recovery of alleles from polymorphic samples and multi‐organism specimens. These results have important implications for understanding intra‐individual genomic variation among genetic loci commonly used in species delimitation of fungi. 
    more » « less
  5. Charleston, Michael (Ed.)
    Abstract We present a 517-gene phylogenetic framework for the breadfruit genus Artocarpus (ca. 70 spp., Moraceae), making use of silica-dried leaves from recent fieldwork and herbarium specimens (some up to 106 years old) to achieve 96% taxon sampling. We explore issues relating to assembly, paralogous loci, partitions, and analysis method to reconstruct a phylogeny that is robust to variation in data and available tools. Although codon partitioning did not result in any substantial topological differences, the inclusion of flanking noncoding sequence in analyses significantly increased the resolution of gene trees. We also found that increasing the size of data sets increased convergence between analysis methods but did not reduce gene-tree conflict. We optimized the HybPiper targeted-enrichment sequence assembly pipeline for short sequences derived from degraded DNA extracted from museum specimens. Although the subgenera of Artocarpus were monophyletic, revision is required at finer scales, particularly with respect to widespread species. We expect our results to provide a basis for further studies in Artocarpus and provide guidelines for future analyses of data sets based on target enrichment data, particularly those using sequences from both fresh and museum material, counseling careful attention to the potential of off-target sequences to improve resolution. [Artocarpus; Moraceae; noncoding sequences; phylogenomics; target enrichment.] 
    more » « less