skip to main content


Title: From e-voucher to genomic data: Preserving archive specimens as demonstrated with medically important mosquitoes (Diptera: Culicidae) and kissing bugs (Hemiptera: Reduviidae)
Scientific collections such as the U.S. National Museum (USNM) are critical to filling knowledge gaps in molecular systematics studies. The global taxonomic impediment has resulted in a reduction of expert taxonomists generating new collections of rare or understudied taxa and these large historic collections may be the only reliable source of material for some taxa. Integrated systematics studies using both morphological examinations and DNA sequencing are often required for resolving many taxonomic issues but as DNA methods often require partial or complete destruction of a sample, there are many factors to consider before implementing destructive sampling of specimens within scientific collections. We present a methodology for the use of archive specimens that includes two crucial phases: 1) thoroughly documenting specimens destined for destructive sampling—a process called electronic vouchering, and 2) the pipeline used for whole genome sequencing of archived specimens, from extraction of genomic DNA to assembly of putative genomes with basic annotation. The process is presented for eleven specimens from two different insect subfamilies of medical importance to humans: Anophelinae (Diptera: Culicidae)—mosquitoes and Triatominae (Hemiptera: Reduviidae)—kissing bugs. Assembly of whole mitochondrial genome sequences of all 11 specimens along with the results of an ortholog search and BLAST against the NCBI nucleotide database are also presented.  more » « less
Award ID(s):
1754376
NSF-PAR ID:
10274852
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Editor(s):
Oliveira, Pedro L.
Date Published:
Journal Name:
PLOS ONE
Volume:
16
Issue:
2
ISSN:
1932-6203
Page Range / eLocation ID:
e0247068
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Over the past decade, museum genomics studies have focused on obtaining DNA of sufficient quality and quantity for sequencing from fluid-preserved natural history specimens, primarily to be used in systematic studies. While these studies have opened windows to evolutionary and biodiversity knowledge of many species worldwide, published works often focus on the success of these DNA sequencing efforts, which is undoubtedly less common than obtaining minimal or sometimes no DNA or unusable sequence data from specimens in natural history collections. Here, we attempt to obtain and sequence DNA extracts from 115 fresh and 41 degraded samples of homalopsid snakes, as well as from two degraded samples of a poorly known snake, Hydrablabes periops . Hydrablabes has been suggested to belong to at least two different families (Natricidae and Homalopsidae) and with no fresh tissues known to be available, intractable museum specimens currently provide the only opportunity to determine this snake’s taxonomic affinity. Although our aim was to generate a target-capture dataset for these samples, to be included in a broader phylogenetic study, results were less than ideal due to large amounts of missing data, especially using the same downstream methods as with standard, high-quality samples. However, rather than discount results entirely, we used mapping methods with references and pseudoreferences, along with phylogenetic analyses, to maximize any usable molecular data from our sequencing efforts, identify the taxonomic affinity of H. periops , and compare sequencing success between fresh and degraded tissue samples. This resulted in largely complete mitochondrial genomes for five specimens and hundreds to thousands of nuclear loci (ultra-conserved loci, anchored-hybrid enrichment loci, and a variety of loci frequently used in squamate phylogenetic studies) from fluid-preserved snakes, including a specimen of H. periops from the Field Museum of Natural History collection. We combined our H. periops data with previously published genomic and Sanger-sequenced datasets to confirm the familial designation of this taxon, reject previous taxonomic hypotheses, and make biogeographic inferences for Hydrablabes . A second H. periops specimen, despite being seemingly similar for initial raw sequencing results and after being put through the same protocols, resulted in little usable molecular data. We discuss the successes and failures of using different pipelines and methods to maximize the products from these data and provide expectations for others who are looking to use DNA sequencing efforts on specimens that likely have degraded DNA. Life Science Identifier ( Hydrablabes periops ) urn:lsid:zoobank.org :pub:F2AA44 E2-D2EF-4747-972A-652C34C2C09D. 
    more » « less
  2. Abstract

    Although they are a valuable source of specimens, insect natural history collections continue to be under‐utilized in molecular systematics, mostly due to difficulties in obtaining DNA sequences. Old specimens or specimens stored under suboptimal conditions are intractable for traditional Sanger sequencing. In this study we use an inexpensive hybrid capture with in‐house generated baits to retrieve commonly utilized ribosomal and mitochondrial loci from old museum specimens and combine them with a Sanger‐generated dataset comprising recently collected material. We focus on theCorixideagenus group (Schizopteridae), which comprises rarely collected, small (1–2 mm) and primarily tropical insects of which onlyc. 10–20% of the species have been described. A molecular phylogeny is needed to resolve relationships and revise the genus‐level classification to correctly place thec. 150 yet to be described species. Applying this approach, we constructed a dataset, containing 101 taxa, 11 of which were preserved in low‐percentage ethanol, 48 are dry and point‐mounted, and 40 are > 20 years old at DNA extraction. The obtained data proved sufficient for reconstructing a well‐supported phylogeny withc. 50% of the predicted diversity, and for the oldest successfully sequenced specimen (95 years) to be unambiguously placed in that phylogeny. We confirmed monophyly of theCorixideagenus group, showed paraphyly of the genusCorixidea, and recovered nine well‐supported clades within the group. Ancestral character states of selected morphological features were inferred and used to re‐examine primary homology hypotheses and inform an upcoming taxonomic revision.

     
    more » « less
  3. Springer, Mark (Ed.)
    Abstract Despite the increasing feasibility of sequencing whole genomes from diverse taxa, a persistent problem in phylogenomics is the selection of appropriate genetic markers or loci for a given taxonomic group or research question. In this review, we aim to streamline the decision-making process when selecting specific markers to use in phylogenomic studies by introducing commonly used types of genomic markers, their evolutionary characteristics, and their associated uses in phylogenomics. Specifically, we review the utilities of ultraconserved elements (including flanking regions), anchored hybrid enrichment loci, conserved nonexonic elements, untranslated regions, introns, exons, mitochondrial DNA, single nucleotide polymorphisms, and anonymous regions (nonspecific regions that are evenly or randomly distributed across the genome). These various genomic elements and regions differ in their substitution rates, likelihood of neutrality or of being strongly linked to loci under selection, and mode of inheritance, each of which are important considerations in phylogenomic reconstruction. These features may give each type of marker important advantages and disadvantages depending on the biological question, number of taxa sampled, evolutionary timescale, cost effectiveness, and analytical methods used. We provide a concise outline as a resource to efficiently consider key aspects of each type of genetic marker. There are many factors to consider when designing phylogenomic studies, and this review may serve as a primer when weighing options between multiple potential phylogenomic markers. 
    more » « less
  4. Abstract

    The field of plant genome sequencing has grown rapidly in the past 20 years, leading to increases in the quantity and quality of publicly available genomic resources. The growing wealth of genomic data from an increasingly diverse set of taxa provides unprecedented potential to better understand the genome biology and evolution of land plants. Here we provide a contemporary view of land plant genomics, including analyses on assembly quality, taxonomic distribution of sequenced species and national participation. We show that assembly quality has increased dramatically in recent years, that substantial taxonomic gaps exist and that the field has been dominated by affluent nations in the Global North and China, despite a wide geographic distribution of study species. We identify numerous disconnects between the native range of focal species and the national affiliation of the researchers studying them, which we argue are rooted in colonialism—both past and present. Luckily, falling sequencing costs, widening availability of analytical tools and an increasingly connected scientific community provide key opportunities to improve existing assemblies, fill sampling gaps and empower a more global plant genomics community.

     
    more » « less
  5. More than ever, ecologists seek to employ herbarium collections to estimate plant functional traits from the past and across biomes. However, many trait measurements are destructive, which may preclude their use on valuable specimens. Researchers increasingly use reflectance spectroscopy to estimate traits from fresh or ground leaves, and to delimit or identify taxa. Here, we extend this body of work to non-destructive measurements on pressed, intact leaves, like those in herbarium collections. Using 618 samples from 68 species, we used partial least-squares regression to build models linking pressed-leaf reflectance spectra to a broad suite of traits, including leaf mass per area (LMA), leaf dry matter content (LDMC), equivalent water thickness, carbon fractions, pigments, and twelve elements. We compared these models to those trained on fresh- or ground-leaf spectra of the same samples. The traits our pressed-leaf models could estimate best were LMA (R2 = 0.932; %RMSE = 6.56), C (R2 = 0.855; %RMSE = 9.03), and cellulose (R2 = 0.803; %RMSE = 12.2), followed by water-related traits, certain nutrients (Ca, Mg, N, and P), other carbon fractions, and pigments (all R2 = 0.514–0.790; %RMSE = 12.8–19.6). Remaining elements were predicted poorly (R2 < 0.5, %RMSE > 20). For most chemical traits, pressed-leaf models performed better than fresh-leaf models, but worse than ground-leaf models. Pressed-leaf models were worse than fresh-leaf models for estimating LMA and LDMC, but better than ground-leaf models for LMA. Finally, in a subset of samples, we used partial least-squares discriminant analysis to classify specimens among 10 species with near-perfect accuracy (>97%) from pressed- and ground-leaf spectra, and slightly lower accuracy (>93%) from fresh-leaf spectra. These results show that applying spectroscopy to pressed leaves is a promising way to estimate leaf functional traits and identify species without destructive analysis. Pressed-leaf spectra might combine advantages of fresh and ground leaves: like fresh leaves, they retain some of the spectral expression of leaf structure; but like ground leaves, they circumvent the masking effect of water absorption. Our study has far-reaching implications for capturing the wide range of functional and taxonomic information in the world’s preserved plant collections. 
    more » « less