skip to main content


Title: A novel exome probe set captures phototransduction genes across birds (Aves) enabling efficient analysis of vision evolution
The diversity of avian visual phenotypes provides a framework for studying mechanisms of trait diversification generally, and the evolution of vertebrate vision, specifically. Previous research has focused on opsins, but to fully understand visual adaptation, we must study the complete phototransduction cascade (PTC). Here, we developed a probe set that captures exonic regions of 46 genes representing the PTC and other light responses. For a subset of species, we directly compared gene capture between our probe set and low-coverage whole genome sequencing (WGS), and we discuss considerations for choosing between these methods. Finally, we developed a unique strategy to avoid chimeric assembly by using “decoy” reference sequences. We successfully captured an average of 64% of our targeted exome in 46 species across 14 orders using the probe set and had similar recovery using the WGS data. Compared to WGS or transcriptomes, our probe set: (1) reduces sequencing requirements by efficiently capturing vision genes, (2) employs a simpler bioinformatic pipeline by limiting required assembly and negating annotation, and (3) eliminates the need for fresh tissues, enabling researchers to leverage existing museum collections. We then utilized our vision exome data to identify positively selected genes in two evolutionary scenarios—evolution of night vision in nocturnal birds and evolution of high-speed vision specific to manakins (Pipridae). We found parallel positive selection of SLC24A1 in both scenarios, implicating the alteration of rod response kinetics, which could improve color discrimination in dim light conditions and/or facilitate higher temporal resolution.  more » « less
Award ID(s):
1711026 1655683
NSF-PAR ID:
10311306
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Molecular Ecology Resources
ISSN:
1755-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Target enrichment (such as Hyb-Seq) is a well-established high throughput sequencing method that has been increasingly used for phylogenomic studies. Unfortunately, current widely used pipelines for analysis of target enrichment data do not have a vigorous procedure to remove paralogs in target enrichment data. In this study, we develop a pipeline we call Putative Paralogs Detection (PPD) to better address putative paralogs from enrichment data. The new pipeline is an add-on to the existing HybPiper pipeline, and the entire pipeline applies criteria in both sequence similarity and heterozygous sites at each locus in the identification of paralogs. Users may adjust the thresholds of sequence identity and heterozygous sites to identify and remove paralogs according to the level of phylogenetic divergence of their group of interest. The new pipeline also removes highly polymorphic sites attributed to errors in sequence assembly and gappy regions in the alignment. We demonstrated the value of the new pipeline using empirical data generated from Hyb-Seq and the Angiosperm 353 kit for two woody genera Castanea (Fagaceae, Fagales) and Hamamelis (Hamamelidaceae, Saxifragales). Comparisons of datasets showed that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed evident differences between data from HybPiper and data from our new PPD pipeline. We further evaluated the accuracy and error rates of PPD by BLAST mapping of putative paralogous and orthologous sequences to a reference genome sequence of Castanea mollissima. Compared to HybPiper alone, PPD identified substantially more paralogous gene sequences that mapped to multiple regions of the reference genome (31 genes for PPD compared with 4 genes for HybPiper alone). In conjunction with HybPiper, paralogous genes identified by both pipelines can be removed resulting in the construction of more robust orthologous gene datasets for phylogenomic and divergence time analyses. Our study demonstrates the value of Hyb-Seq with data derived from the Angiosperm 353 probe set for elucidating species relationships within a genus, and argues for the importance of additional steps to filter paralogous genes and poorly aligned regions (e.g., as occur through assembly errors), such as our new PPD pipeline described in this study. 
    more » « less
  2. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  3. Abstract

    Avoiding extinction in a rapidly changing environment often relies on a species’ ability to quickly adapt in the face of extreme selective pressures. In Panamá, two closely related harlequin frog species (Atelopus variusandAtelopus zeteki) are threatened with extinction due to the fungal pathogenBatrachochytrium dendrobatidis(Bd). Once thought to be nearly extirpated from Panamá,A. variushave recently been rediscovered in multiple localities across their historical range; however,A. zetekiare possibly extinct in the wild. By leveraging a unique collection of 186Atelopustissue samples collected before and after theBdoutbreak in Panama, we describe the genetics of persistence for these species on the brink of extinction. We sequenced the transcriptome and developed an exome‐capture assay to sequence the coding regions of theAtelopusgenome. Using these genetic data, we evaluate the population genetic structure of historicalA. variusandA. zetekipopulations, describe changes in genetic diversity over time, assess the relationship between contemporary and historical individuals, and test the hypothesis that someA. variuspopulations have rapidly evolved to resist or tolerateBdinfection. We found a significant decrease in genetic diversity in contemporary (compared to historical)A. variuspopulations. We did not find strong evidence of directional allele frequency change or selection forBdresistance genes, but we uncovered a set of candidate genes that warrant further study. Additionally, we found preliminary evidence of recent migration and gene flow in one of the largest persistingA. variuspopulations in Panamá, suggesting the potential for genetic rescue in this system. Finally, we propose that previous conservation units should be modified, as clear genetic breaks do not exist beyond the local population level. Our data lay the groundwork for genetically informed conservation and advance our understanding of how imperiled species might be rescued from extinction.

     
    more » « less
  4. Abstract

    Long‐read sequencing is driving a new reality for genome science in which highly contiguous assemblies can be produced efficiently with modest resources. Genome assemblies from long‐read sequences are particularly exciting for understanding the evolution of complex genomic regions that are often difficult to assemble. In this study, we utilized long‐read sequencing data to generate a high‐quality genome assembly for an Antarctic eelpout,Ophthalmolycus amberensis, the first for the globally distributed family Zoarcidae. We used this assembly to understand howO. amberensishas adapted to the harsh Southern Ocean and compared it to another group of Antarctic fishes: the notothenioids. We showed that selection has largely acted on different targets in eelpouts relative to notothenioids. However, we did find some overlap; in both groups, genes involved in membrane structure, thermal tolerance and vision have evidence of positive selection. We found evidence for historical shifts of transposable element activity inO. amberensisand other polar fishes, perhaps reflecting a response to environmental change. We were specifically interested in the evolution of two complex genomic loci known to underlie key adaptations to polar seas: haemoglobin and antifreeze proteins (AFPs). We observed unique evolution of the haemoglobin MN cluster in eelpouts and related fishes in the suborder Zoarcoidei relative to other Perciformes. For AFPs, we identified the first species in the suborder with no evidence ofafpIIIsequences (Cebidichthys violaceus) in the genomic region where they are found in all other Zoarcoidei, potentially reflecting a lineage‐specific loss of this cluster. Beyond polar fishes, our results highlight the power of long‐read sequencing to understand genome evolution.

     
    more » « less
  5. Salmonids are ideal models as many species follow a distinct developmental program from demersal eggs and a large yolk sac to hatching at an advanced developmental stage. Further, these economically important teleosts inhabit both marine- and freshwaters and experience diverse light environments during their life histories. At a genome level, salmonids have undergone a salmonid-specific fourth whole genome duplication event (Ss4R) compared to other teleosts that are already more genetically diverse compared to many non-teleost vertebrates. Thus, salmonids display phenotypically plastic visual systems that appear to be closely related to their anadromous migration patterns. This is most likely due to a complex interplay between their larger, more gene-rich genomes and broad spectrally enriched habitats; however, the molecular basis and functional consequences for such diversity is not fully understood. This study used advances in genome sequencing to identify the repertoire and genome organization of visual opsin genes (those primarily expressed in retinal photoreceptors) from six different salmonids [Atlantic salmon ( Salmo salar ), brown trout ( Salmo trutta ), Chinook salmon ( Oncorhynchus tshawytcha ), coho salmon ( Oncorhynchus kisutch ), rainbow trout ( Oncorhynchus mykiss ), and sockeye salmon ( Oncorhynchus nerka )] compared to the northern pike ( Esox lucius ), a closely related non-salmonid species. Results identified multiple orthologues for all five visual opsin classes, except for presence of a single short-wavelength-sensitive-2 opsin gene. Several visual opsin genes were not retained after the Ss4R duplication event, which is consistent with the concept of salmonid rediploidization. Developmentally, transcriptomic analyzes of Atlantic salmon revealed differential expression within each opsin class, with two of the long-wavelength-sensitive opsins not being expressed before first feeding. Also, early opsin expression in the retina was located centrally, expanding dorsally and ventrally as eye development progressed, with rod opsin being the dominant visual opsin post-hatching. Modeling by spectral tuning analysis and atomistic molecular simulation, predicted the greatest variation in the spectral peak of absorbance to be within the Rh2 class, with a ∼40 nm difference in λ max values between the four medium-wavelength-sensitive photopigments. Overall, it appears that opsin duplication and expression, and their respective spectral tuning profiles, evolved to maximize specialist color vision throughout an anadromous lifecycle, with some visual opsin genes being lost to tailor marine-based vision. 
    more » « less