PremiseUnderstanding relationships among orchid species and populations is of critical importance for orchid conservation. Target sequence capture has become a standard method for extracting hundreds of orthologous loci for phylogenomics. Up‐front cost and time associated with design of bait sets makes this method prohibitively expensive for many researchers. Therefore, we designed a target capture kit to reliably sequence hundreds of orthologous loci across orchid lineages. MethodsWe designed an Orchidaceae target capture bait set for 963 single‐copy genes identified in published orchid genome sequences. The bait set was tested on 28 orchid species, with representatives of the subfamilies Cypripedioideae, Orchidoideae, and Epidendroideae. ResultsBetween 1,518,041 and 87,946,590 paired‐end 150‐base reads were generated for target‐enriched genomic libraries. We assembled an average of 812 genes per library for Epidendroideae species and a mean of 501 genes for species in the subfamilies Orchidoideae and Cypripedioideae. Furthermore, libraries had on average 107 of the 254 genes that are included in the Angiosperms353 bait set, allowing for direct comparison of studies using either bait set. DiscussionThe Orchidaceae963 kit will enable greater accessibility and utility of next‐generation sequencing for orchid systematics, population genetics, and identification in the illegal orchid trade.
more »
« less
Calceolariaceae809: A bait set for targeted sequencing of nuclear loci
Abstract PremiseThe genusCalceolaria(Calceolariaceae) is emblematic of the Andes, is hypothesized to have originated as a recent, rapid radiation, and has important taxonomic needs. Additionally, the genus is a model for the study of specialized pollination systems, as its flowers are nectarless and many offer floral oils as a pollination reward collected by specialist bees. Despite their evolutionary and ecological significance, obtaining a resolved phylogeny for the group has proved difficult. To address this challenge, we present a new bait set for targeted sequencing of nuclear loci in Calceolariaceae and close relatives. MethodsWe developed a bioinformatic workflow to use incomplete, low‐coverage genomes of 10Calceolariaspecies to identify single‐copy loci suitable for phylogenetic studies and design baits for targeted sequencing. ResultsOur approach resulted in the identification of 809 single‐copy loci (733 noncoding and 76 coding regions) and the development of 39,937 baits, which we validated in silico (10 specimens) and in vitro (29 Calceolariaceae and six outgroups). In both cases, the data allowed us to recover robust phylogenetic estimates. DiscussionOur results demonstrate the appropriateness of the bait set for sequencing recent and historic specimens of Calceolariaceae and close relatives, and open new doors for further investigation of the evolutionary history of this hyperdiverse genus.
more »
« less
- Award ID(s):
- 2050745
- PAR ID:
- 10526385
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Applications in Plant Sciences
- Volume:
- 11
- Issue:
- 6
- ISSN:
- 2168-0450
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract MotivationBait enrichment is a protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes (‘baits’) are designed, manufactured and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. Metsky et al. demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples. ResultsWe formalize the problem of designing baits by defining the Minimum Bait Cover problem, show that the problem is NP-hard even under very restrictive assumptions, and design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 min to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 17% of the data in 72 h. Availability and implementationhttps://github.com/jnalanko/syotti. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract PremiseRubiaceae is among the most species‐rich plant families, as well as one of the most morphologically and geographically diverse. Currently available phylogenies have mostly relied on few genomic and plastid loci, as opposed to large‐scale genomic data. Target enrichment provides the ability to generate sequence data for hundreds to thousands of phylogenetically informative, single‐copy loci, which often leads to improved phylogenetic resolution at both shallow and deep taxonomic scales; however, a publicly accessible Rubiaceae‐specific probe set that allows for comparable phylogenetic inference across clades is lacking. MethodsHere, we use publicly accessible genomic resources to identify putatively single‐copy nuclear loci for target enrichment in two Rubiaceae groups: tribe Hillieae (Cinchonoideae) and tribal complex Palicoureeae+Psychotrieae (Rubioideae). We sequenced 2270 exonic regions corresponding to 1059 loci in our target clades and generated in silico target enrichment sequences for other Rubiaceae taxa using our designed probe set. To test the utility of our probe set for phylogenetic inference across Rubiaceae, we performed a coalescent‐aware phylogenetic analysis using a subset of 27 Rubiaceae taxa from 10 different tribes and three subfamilies, and one outgroup in Apocynaceae. ResultsWe recovered an average of 75% and 84% of targeted exons and loci, respectively, per Rubiaceae sample. Probes designed using genomic resources from a particular subfamily were most efficient at targeting sequences from taxa in that subfamily. The number of paralogs recovered during assembly varied for each clade. Phylogenetic inference of Rubiaceae with our target regions resolves relationships at various scales. Relationships are largely consistent with previous studies of relationships in the family with high support (≥0.98 local posterior probability) at nearly all nodes and evidence of gene tree discordance. DiscussionOur probe set, which we call Rubiaceae2270x, was effective for targeting loci in species across and even outside of Rubiaceae. This probe set will facilitate phylogenomic studies in Rubiaceae and advance systematics and macroevolutionary studies in the family.more » « less
-
PremiseMultiple transitions from insect to wind pollination are associated with polyploidy and unisexual flowers inThalictrum(Ranunculaceae), yet the underlying genetics remains unknown. We generated a draft genome ofThalictrum thalictroides, a representative of a clade with ancestral floral traits (diploid, hermaphrodite, and insect pollinated) and a model for functional studies. Floral transcriptomes ofT. thalictroidesand of wind‐pollinated, andromonoeciousT. hernandeziiare presented as a resource to facilitate candidate gene discovery in flowers with different sexual and pollination systems. MethodsA draft genome ofT. thalictroidesand two floral transcriptomes ofT. thalictroidesandT. hernandeziiwere obtained from HiSeq 2000 Illumina sequencing and de novo assembly. ResultsTheT. thalictroidesde novo draft genome assembly consisted of 44,860 contigs (N50 = 12,761 bp, 243 Mbp total length) and contained 84.5% conserved embryophyte single‐copy genes. Floral transcriptomes contained representatives of most eukaryotic core genes, and most of their genes formed orthogroups. DiscussionTo validate the utility of these resources, potential candidate genes were identified for the different floral morphologies using stepwise data set comparisons. Single‐copy gene analysis and simple sequence repeat markers were also generated as a resource for population‐level and phylogenetic studies.more » « less
-
PremisePhylogenetic relationships within major angiosperm clades are increasingly well resolved, but largely informed by plastid data. Areas of poor resolution persist within the Dipsacales, including placement ofHeptacodiumandZabelia, and relationships within the Caprifolieae and Linnaeeae, hindering our interpretation of morphological evolution. Here, we sampled a significant number of nuclear loci using a Hyb‐Seq approach and used these data to infer the Dipsacales phylogeny and estimate divergence times. MethodsSampling all major clades within the Dipsacales, we applied the Angiosperms353 probe set to 96 species. Data were filtered based on locus completeness and taxon recovery per locus, and trees were inferred using RAxML and ASTRAL. Plastid loci were assembled from off‐target reads, and 10 fossils were used to calibrate dated trees. ResultsVarying numbers of targeted loci and off‐target plastomes were recovered from most taxa. Nuclear and plastid data confidently placeHeptacodiumwith Caprifolieae, implying homoplasy in calyx morphology, ovary development, and fruit type. Placement ofZabelia, and relationships within the Caprifolieae and Linnaeeae, remain uncertain. Dipsacales diversification began earlier than suggested by previous angiosperm‐wide dating analyses, but many major splitting events date to the Eocene. ConclusionsThe Angiosperms353 probe set facilitated the assembly of a large, single‐copy nuclear dataset for the Dipsacales. Nevertheless, many relationships remain unresolved, and resolution was poor for woody clades with low rates of molecular evolution. We favor expanding the Angiosperms353 probe set to include more variable loci and loci of special interest, such as developmental genes, within particular clades.more » « less
An official website of the United States government

