skip to main content


Title: An evaluation of transcriptome‐based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura)
Abstract

Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers in nonmodel organisms. Transcriptome‐based exon capture utilizes transcript sequences to design capture probes, typically using a reference genome to identify intron–exon boundaries to exclude shorter exons (<200 bp). Here, we test directly using transcript sequences for probe design, which are often composed of multiple exons of varying lengths. Using 1260 orthologous transcripts, we conducted sequence captures across multiple phylogenetic scales for frogs, including outgroups ~100 Myr divergent from the ingroup. We recovered a large phylogenomic data set consisting of sequence alignments for 1047 of the 1260 transcriptome‐based loci (~561 000 bp) and a large quantity of highly variable regions flanking the exons in transcripts (~70 000 bp), the latter improving substantially by only including ingroup species (~797 000 bp). We recovered both shorter (<100 bp) and longer exons (>200 bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and nontarget depletion during captures, and differences inPCRduplication rates resulting from the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on a priori knowledge of nuclear pairwise differences among samples. We provide recommendations for transcriptome‐based exon capture design based on our results, cost estimates and offer multiple pipelines for data assembly and analysis.

 
more » « less
PAR ID:
10243953
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
16
Issue:
5
ISSN:
1755-098X
Page Range / eLocation ID:
p. 1069-1083
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Exon markers have a long history of use in phylogenetics of ray‐finned fishes, the most diverse clade of vertebrates with more than 35,000 species. As the number of published genomes increases, it has become easier to test exons and other genetic markers for signals of ancient duplication events and filter out paralogues that can mislead phylogenetic analysis. We present seven new probe sets for current target‐capture phylogenomic protocols that capture 1,104 exons explicitly filtered for paralogues using gene trees. These seven probe sets span the diversity of teleost fishes, including four sets that target five hyperdiverse percomorph clades which together comprise ca. 17,000 species (Carangaria, Ovalentaria, Eupercaria, and Syngnatharia + Pelagiaria combined). We additionally included probes to capture legacy nuclear exons and mitochondrial markers that have been commonly used in fish phylogenetics (despite some exons being flagged for paralogues) to facilitate integration of old and new molecular phylogenetic matrices. We tested these probes experimentally for 56 fish species (eight species per probe set) and merged new exon‐capture sequence data into an existing data matrix of 1,104 exons and 300 ray‐finned fish species. We provide an optimized bioinformatics pipeline to assemble exon capture data from raw reads to alignments for downstream analysis. We show that legacy loci with known paralogues are at risk of assembling duplicated sequences with target‐capture, but we also assembled many useful orthologous sequences that can be integrated with many PCR‐generated matrices. These probe sets are a valuable resource for advancing fish phylogenomics because targeted exons can easily be extracted from increasingly available whole genome and transcriptome data sets, and also may be integrated with existing PCR‐based exon and mitochondrial data.

     
    more » « less
  2. Abstract

    Exons within transcripts are traditionally classified as first, internal or last exons, each governed by different regulatory mechanisms. We recently described the widespread usage of ‘hybrid’ exons that serve as terminal or internal exons in different transcripts. Here, we employ an interpretable deep learning pipeline to dissect the sequence features governing the co-regulation of transcription initiation and splicing in hybrid exons. Using ENCODE data from human tissues, we identified 80 000 hybrid first-internal exons. These exons often possess a relaxed chromatin state, allowing transcription initiation within the gene body. Interestingly, transcription start sites of hybrid exons are typically centered at the 3′ splice site, suggesting tight coupling between splicing and transcription initiation. We identified two subcategories of hybrid exons: the majority resemble internal exons, maintaining strong 3′ splice sites, while a minority show enrichment in promoter elements, resembling first exons. Diving into the evolution of their sequences, we found that human hybrid exons with orthologous first exons in other species usually gained 3′ splice sites or whole exons upstream, while those with orthologous internal exons often gained promoter elements. Overall, our findings unveil the intricate regulatory landscape of hybrid exons and reveal stronger connections between transcription initiation and RNA splicing than previously acknowledged.

     
    more » « less
  3. Abstract

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identifySNPmarkers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein‐coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis ariesv. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR‐basedSNPchip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositanandbayescan), we detected 28SNPloci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease‐regulating functions (e.g. Ovar‐DRA,APC,BATF2,MAGEB18), cell regulation signalling pathways (e.g.KRIT1,PI3K,ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene‐targetedSNPdiscovery and subsequentSNPchip genotyping using low‐quality samples in a nonmodel species.

     
    more » « less
  4. Abstract

    Exome capture is an effective tool for surveying the genome for loci under selection. However, traditional methods require annotated genomic resources. Here, we present a method for creatingcDNAprobes from expressedmRNA, which are then used to enrich and capture genomicDNAfor exon regions. This approach, called “EecSeq,” eliminates the need for costly probe design and synthesis. We tested EecSeq in the eastern oyster,Crassostrea virginica, using a controlled exposure experiment. Four adult oysters were heat shocked at 36°C for 1 hr along with four control oysters kept at 14°C. StrandedmRNAlibraries were prepared for two individuals from each treatment and pooled. Half of the combined library was used for probe synthesis, and half was sequenced to evaluate capture efficiency. GenomicDNAwas extracted from all individuals, enriched via captured probes, and sequenced directly. We found that EecSeq had an average capture sensitivity of 86.8% across all known exons and had over 99.4% sensitivity for exons with detectable levels of expression in themRNAlibrary. For all mapped reads, over 47.9% mapped to exons and 37.0% mapped to expressed targets, which is similar to previously published exon capture studies. EecSeq displayed relatively even coverage within exons (i.e., minor “edge effects”) and even coverage across exonGCcontent. We discovered 5,951SNPs with a minimum average coverage of 80×, with 3,508SNPs appearing in exonic regions. We show that EecSeq provides comparable, if not superior, specificity and capture efficiency compared to costly, traditional methods.

     
    more » « less
  5. Abstract

    Phylogenomic analysis of large genome-wide sequence data sets can resolve phylogenetic tree topologies for large species groups, help test the accuracy of and improve resolution for earlier multi-locus studies and reveal the level of agreement or concordance within partitions of the genome for various tree topologies. Here we used a target-capture approach to sequence 1088 single-copy exons for more than 200 labrid fishes together with more than 100 outgroup taxa to generate a new data-rich phylogeny for the family Labridae. Our time-calibrated phylogenetic analysis of exon-capture data pushes the root node age of the family Labridae back into the Cretaceous to about 79 Ma years ago. The monotypic Centrogenys vaigiensis, and the order Uranoscopiformes (stargazers) are identified as the sister lineages of Labridae. The phylogenetic relationships among major labrid subfamilies and within these clades were largely congruent with prior analyses of select mitochondrial and nuclear datasets. However, the position of the tribe Cirrhilabrini (fairy and flame wrasses) showed discordance, resolving either as the sister to a crown julidine clade or alternatively sister to a group formed by the labrines, cheilines and scarines. Exploration of this pattern using multiple approaches leads to slightly higher support for this latter hypothesis, highlighting the importance of genome-level data sets for resolving short internodes at key phylogenetic positions in a large, economically important groups of coral reef fishes. More broadly, we demonstrate how accounting for sources of biological variability from incomplete lineage sorting and exploring systematic error at conflicting nodes can aid in evaluating alternative phylogenetic hypotheses. [coral reefs; divergence time estimation; exon-capture; fossil calibration; incomplete lineage sorting.]

     
    more » « less