skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A novel exome probe set captures phototransduction genes across birds (Aves) enabling efficient analysis of vision evolution
The diversity of avian visual phenotypes provides a framework for studying mechanisms of trait diversification generally, and the evolution of vertebrate vision, specifically. Previous research has focused on opsins, but to fully understand visual adaptation, we must study the complete phototransduction cascade (PTC). Here, we developed a probe set that captures exonic regions of 46 genes representing the PTC and other light responses. For a subset of species, we directly compared gene capture between our probe set and low-coverage whole genome sequencing (WGS), and we discuss considerations for choosing between these methods. Finally, we developed a unique strategy to avoid chimeric assembly by using “decoy” reference sequences. We successfully captured an average of 64% of our targeted exome in 46 species across 14 orders using the probe set and had similar recovery using the WGS data. Compared to WGS or transcriptomes, our probe set: (1) reduces sequencing requirements by efficiently capturing vision genes, (2) employs a simpler bioinformatic pipeline by limiting required assembly and negating annotation, and (3) eliminates the need for fresh tissues, enabling researchers to leverage existing museum collections. We then utilized our vision exome data to identify positively selected genes in two evolutionary scenarios—evolution of night vision in nocturnal birds and evolution of high-speed vision specific to manakins (Pipridae). We found parallel positive selection of SLC24A1 in both scenarios, implicating the alteration of rod response kinetics, which could improve color discrimination in dim light conditions and/or facilitate higher temporal resolution.  more » « less
Award ID(s):
1711026 1655683
PAR ID:
10311306
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Molecular Ecology Resources
ISSN:
1755-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. IntroductionBacteria are frequently isolated from surfaces in cleanrooms, where astromaterials are curated, at NASA’s Lyndon B. Johnson Space Center (JSC).Bacillusspecies are of particular interest because endospores can endure extreme conditions. Current monitoring programs at JSC rely on culturing microbes from swabs of surfaces followed by identification by 16S rRNA sequencing and the VITEK 2 Compact bacterial identification system. These methods have limited power to resolveBacillusspecies. Whole genome sequencing (WGS) is the current standard for bacterial identification but is expensive and time-consuming. Matrix-assisted laser desorption - time of flight mass spectrometry (MALDI-TOF MS), provides a rapid, low-cost, method of identifying bacterial isolates and has a higher resolution than 16S rRNA sequencing, particularly forBacillusspecies; however, few studies have compared this method to WGS for identification ofBacillusspecies isolated from cleanrooms. MethodsTo address this, we selected 15 isolates for analysis with WGS and MALDI-TOF MS. Hybrid next-generation (Illumina) and 3rd-generation (nanopore) sequencing were used to draft genomes. Mass spectra, generated with MALDI-TOF MS, were processed with custom scripts to identify clusters of closely related isolates. ResultsMALDI-TOF MS and WGS identified 13/15 and 9/14 at the species level, respectively, and clusters of species generated from MALDI-TOF MS showed good agreement, in terms of congruence of partitioning, with phylotypes generated with WGS. Pairs of strains that were > 94% similar to each other, in terms of average amino acid identity (AAI) predicted by WGS, consistently showed cosine similarities of mass spectra >0.8. The only discordance was for a pair of isolates that were classified asPaenibacillusspecies. This pair showed relatively high similarity (0.85) in terms of MALDI-TOF MS but only 85% similarity in terms of AAI. In addition, some strains isolated from cleanrooms at the JSC appeared closely related to strains isolated from spacecraft assembly cleanrooms. DiscussionSince MALDI-TOF MS costs less than whole genome sequencing and offers a throughput of hundreds of isolates per hour, this approach appears to offer a cost-efficient option for identifyingBacillusspecies, and related microbes, isolated during routine monitoring of cleanrooms and similar built environments. 
    more » « less
  2. Abstract Background Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone. 
    more » « less
  3. Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems. 
    more » « less
  4. Abstract Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories. 
    more » « less
  5. null (Ed.)
    Despite many bioinformatic solutions for analyzing sequencing data, few options exist for targeted sequence retrieval from whole genomic sequencing (WGS) data with the ultimate goal of generating a phylogeny. Available tools especially struggle at deep phylogenetic levels and necessitate amino-acid space searches, which may increase rates of false positive results. Many tools are also difficult to install and may lack adequate user resources. Here, we describe a program that uses freely available similarity search tools to find homologs in assembled WGS data with unparalleled freedom to modify parameters. We evaluate its performance compared to other commonly used bioinformatics tools on two divergent insect species (>200 My) for which annotated genomes exist, and on one large set each of highly conserved and more variable loci. Our software is capable of retrieving orthologs from well-curated or unannotated, low or high depth shotgun, and target capture assemblies as well or better than other software as assessed by recovering the most genes with maximal coverage and with a low rate of false positives throughout all datasets. When assessing this combination of criteria, ALiBaSeq is frequently the best evaluated tool for gathering the most comprehensive and accurate phylogenetic alignments on all types of data tested. The software (implemented in Python), tutorials, and manual are freely available at https://github.com/AlexKnyshov/alibaseq . 
    more » « less