skip to main content


Title: Expressed exome capture sequencing: A method for cost‐effective exome sequencing for all organisms
Abstract

Exome capture is an effective tool for surveying the genome for loci under selection. However, traditional methods require annotated genomic resources. Here, we present a method for creatingcDNAprobes from expressedmRNA, which are then used to enrich and capture genomicDNAfor exon regions. This approach, called “EecSeq,” eliminates the need for costly probe design and synthesis. We tested EecSeq in the eastern oyster,Crassostrea virginica, using a controlled exposure experiment. Four adult oysters were heat shocked at 36°C for 1 hr along with four control oysters kept at 14°C. StrandedmRNAlibraries were prepared for two individuals from each treatment and pooled. Half of the combined library was used for probe synthesis, and half was sequenced to evaluate capture efficiency. GenomicDNAwas extracted from all individuals, enriched via captured probes, and sequenced directly. We found that EecSeq had an average capture sensitivity of 86.8% across all known exons and had over 99.4% sensitivity for exons with detectable levels of expression in themRNAlibrary. For all mapped reads, over 47.9% mapped to exons and 37.0% mapped to expressed targets, which is similar to previously published exon capture studies. EecSeq displayed relatively even coverage within exons (i.e., minor “edge effects”) and even coverage across exonGCcontent. We discovered 5,951SNPs with a minimum average coverage of 80×, with 3,508SNPs appearing in exonic regions. We show that EecSeq provides comparable, if not superior, specificity and capture efficiency compared to costly, traditional methods.

 
more » « less
Award ID(s):
1635423
PAR ID:
10060536
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
18
Issue:
6
ISSN:
1755-098X
Page Range / eLocation ID:
p. 1209-1222
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identifySNPmarkers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein‐coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis ariesv. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR‐basedSNPchip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositanandbayescan), we detected 28SNPloci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease‐regulating functions (e.g. Ovar‐DRA,APC,BATF2,MAGEB18), cell regulation signalling pathways (e.g.KRIT1,PI3K,ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene‐targetedSNPdiscovery and subsequentSNPchip genotyping using low‐quality samples in a nonmodel species.

     
    more » « less
  2. Abstract

    Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers in nonmodel organisms. Transcriptome‐based exon capture utilizes transcript sequences to design capture probes, typically using a reference genome to identify intron–exon boundaries to exclude shorter exons (<200 bp). Here, we test directly using transcript sequences for probe design, which are often composed of multiple exons of varying lengths. Using 1260 orthologous transcripts, we conducted sequence captures across multiple phylogenetic scales for frogs, including outgroups ~100 Myr divergent from the ingroup. We recovered a large phylogenomic data set consisting of sequence alignments for 1047 of the 1260 transcriptome‐based loci (~561 000 bp) and a large quantity of highly variable regions flanking the exons in transcripts (~70 000 bp), the latter improving substantially by only including ingroup species (~797 000 bp). We recovered both shorter (<100 bp) and longer exons (>200 bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and nontarget depletion during captures, and differences inPCRduplication rates resulting from the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on a priori knowledge of nuclear pairwise differences among samples. We provide recommendations for transcriptome‐based exon capture design based on our results, cost estimates and offer multiple pipelines for data assembly and analysis.

     
    more » « less
  3. Abstract

    Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction‐site‐associatedDNAsequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduceRADcap, an approach that combines the major benefits ofRADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches.RADcap uses a new version of dual‐digestRADseq (3RAD) to identify candidateSNPloci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidateSNPloci across many individuals. We combined this approach with a new library preparation method for identifying and removingPCRduplicates from 3RADlibraries, which allows researchers to processRADseq data using traditional pipelines, and we tested theRADcap method by genotyping sets of 96–384Wisteriaplants. Our results demonstrate that ourRADcap method: (i) methodologically reduces (to <5%) and allows computational removal ofPCRduplicate reads from data, (ii) achieves 80–90% reads on target in 11 of 12 enrichments, (iii) returns consistent coverage (≥4×) across >90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

     
    more » « less
  4. Abstract

    The accelerating rate at whichDNAsequence data are now generated by high‐throughput sequencing instruments provides both opportunities and challenges for population genetic and ecological investigations of animals and plants. We show here how the common practice of calling genotypes from a singleSNPper sequenced region ignores substantial additional information in the phased short‐read sequences that are provided by these sequencing instruments. We target sequenced regions with multipleSNPs in kelp rockfish (Sebastes atrovirens) to determine “microhaplotypes” and then call these microhaplotypes as alleles at each locus. We then demonstrate how these multi‐allelic marker data from such loci dramatically increase power for relationship inference. The microhaplotype approach decreases false‐positive rates by several orders of magnitude, relative to calling bi‐allelicSNPs, for two challenging analytical procedures, full‐sibling and single parent–offspring pair identification. We also show how the identification of half‐sibling pairs requires so much data that physical linkage becomes a consideration, and that most published studies that attempt to do so are dramatically underpowered. The advent of phased short‐readDNAsequence data, in conjunction with emerging analytical tools for their analysis, promises to improve efficiency by reducing the number of loci necessary for a particular level of statistical confidence, thereby lowering the cost of data collection and reducing the degree of physical linkage amongst markers used for relationship estimation. Such advances will facilitate collaborative research and management for migratory and other widespread species.

     
    more » « less
  5. Abstract

    Next‐generation sequencing technologies now allow researchers of non‐model systems to perform genome‐based studies without the requirement of a (often unavailable) closely related genomic reference. We evaluated the role of restriction endonuclease (RE) selection in double‐digest restriction‐site‐associatedDNAsequencing (ddRADseq) by generating reduced representation genome‐wide data using four differentREcombinations. Our expectation was thatREselections targeting longer, more complex restriction sites would recover fewer loci thanREwith shorter, less complex sites. We sequenced a diverse sample of non‐model arachnids, including five congeneric pairs of harvestmen (Opiliones) and four pairs of spiders (Araneae). Sample pairs consisted of either conspecifics or closely related congeneric taxa, and in total 26 sample pair analyses were tested. Sequence demultiplexing, read clustering and variant calling were performed in thepyRADprogram. The 6‐base pair cutterEcoRIcombined with methylated site‐specific 4‐base pair cutterMspIproduced, on average, the greatest numbers of intra‐individual loci and shared loci per sample pair. As expected, the number of shared loci recovered for a sample pair covaried with the degree of genetic divergence, estimated with cytochrome oxidase I sequences, although this relationship was non‐linear. Our comparative results will prove useful in guiding protocol selection for ddRADseq experiments on many arachnid taxa where reference genomes, even from closely related species, are unavailable.

     
    more » « less