skip to main content

Title: A two‐tier bioinformatic pipeline to develop probes for target capture of nuclear loci with applications in Melastomataceae

Putatively single‐copy nuclear (SCN) loci, which are identified using genomic resources of closely related species, are ideal for phylogenomic inference. However, suitable genomic resources are not available for many clades, including Melastomataceae. We introduce a versatile approach to identify SCN loci for clades with few genomic resources and use it to develop probes for target enrichment in the distantly relatedMemecylonandTibouchina(Melastomataceae).


We present a two‐tiered pipeline. First, we identified putatively SCN loci using MarkerMiner and transcriptomes from distantly related species in Melastomataceae. Published loci and genes of functional significance were then added (384 total loci). Second, using HybPiper, we retrieved 689 homologous template sequences for these loci using genome‐skimming data from within the focal clades.


We sequenced 193 loci common toMemecylonandTibouchina. Probes designed from 56 template sequences successfully targeted sequences in both clades. Probes designed from genome‐skimming data within a focal clade were more successful than probes designed from other sources.


Our pipeline successfully identified and targeted SCN loci inMemecylonandTibouchina, enabling phylogenomic studies in both clades and potentially across Melastomataceae. This pipeline could be easily applied to other clades with few genomic resources.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Applications in Plant Sciences
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    In the past three decades, several studies have predominantly relied on a small sample of the plastome to infer deep phylogenetic relationships in the species-rich Melastomataceae. Here, we report the first full plastid sequences of this family, compare general features of the sampled plastomes to other sequenced Myrtales, and survey the plastomes for highly informative regions for phylogenetics.


    Genome skimming was performed for 16 species spread across the Melastomataceae. Plastomes were assembled, annotated and compared to eight sequenced plastids in the Myrtales. Phylogenetic inference was performed using Maximum Likelihood on six different data sets, where putative biases were taken into account. Summary statistics were generated for all introns and intergenic spacers with suitable size for polymerase chain reaction (PCR) amplification and used to rank the markers by phylogenetic information.


    The majority of the plastomes sampled are conserved in gene content and order, as well as in sequence length and GC content within plastid regions and sequence classes. Departures include the putative presence ofrps16andrpl2pseudogenes in some plastomes. Phylogenetic analyses of the majority of the schemes analyzed resulted in the same topology with high values of bootstrap support. Although there is still uncertainty in some relationships, in the highest supported topologies only two nodes received bootstrap values lower than 95%.


    Melastomataceae plastomes are no exception for the general patterns observed in the genomic structure of land plant chloroplasts, being highly conserved and structurally similar to most other Myrtales. Despite the fact that the full plastome phylogeny shares most of the clades with the previously widely used and reduced data set, some changes are still observed and bootstrap support is higher. The plastome data set presented here is a step towards phylogenomic analyses in the Melastomataceae and will be a useful resource for future studies.

    more » « less
  2. Abstract

    High‐throughput sequencing of transcriptomes and targeted genomic regions are advancing our knowledge of The Tree of Life. Building phylogenies with regions of the genome requires 1‐to‐1 orthologue resources of genes and noncoding loci. One organismal group that has received little attention in this area is the Hemiptera, the fifth largest insect order represented by ~103,590 named species. Here, we present a set of 3,872 Hemiptera 1‐to‐1 orthogroups based on tree‐based orthology inference of eight Hemiptera species with publicly available genome sequences. We also estimate a set of 406 orthologous exons with similar mRNA splice sites that can be used for Sanger sequencing and develop enrichment probes for targeted genome sequencing for phylogenomic inference. We show this novel set of orthologues is informative at the protein, coding sequence and exon molecular levels and provides robust branch support in both gene tree–species tree methods and concatenated sequence phylogenies. In addition, we demonstrate the utility of these loci to resolve relationships in whiteflies,Bemisia tabaci, a large species complex with few phylogenomic resources. Last, we compare our Hemiptera phylogeny with previously published phylogenies and other orthologue databases, while providing suggestions on further improvement to this phylogenomic resource.

    more » « less
  3. Buerkle, Alex (Ed.)
    Inferences about past processes of adaptation and speciation require a gene-scale and genome-wide understanding of the evolutionary history of diverging taxa. In this study, we use genome-wide capture of nuclear gene sequences, plus skimming of organellar sequences, to investigate the phylogenomics of monkeyflowers in Mimulus section Erythranthe (27 accessions from seven species ) . Taxa within Erythranthe , particularly the parapatric and putatively sister species M . lewisii (bee-pollinated) and M . cardinalis (hummingbird-pollinated), have been a model system for investigating the ecological genetics of speciation and adaptation for over five decades. Across >8000 nuclear loci, multiple methods resolve a predominant species tree in which M . cardinalis groups with other hummingbird-pollinated taxa (37% of gene trees), rather than being sister to M . lewisii (32% of gene trees). We independently corroborate a single evolution of hummingbird pollination syndrome in Erythranthe by demonstrating functional redundancy in genetic complementation tests of floral traits in hybrids; together, these analyses overturn a textbook case of pollination-syndrome convergence. Strong asymmetries in allele sharing (Patterson’s D-statistic and related tests) indicate that gene tree discordance reflects ancient and recent introgression rather than incomplete lineage sorting. Consistent with abundant introgression blurring the history of divergence, low-recombination and adaptation-associated regions support the new species tree, while high-recombination regions generate phylogenetic evidence for sister status for M . lewisii and M . cardinalis . Population-level sampling of core taxa also revealed two instances of chloroplast capture, with Sierran M . lewisii and Southern Californian M . parishii each carrying organelle genomes nested within respective sympatric M . cardinalis clades. A recent organellar transfer from M . cardinalis , an outcrosser where selfish cytonuclear dynamics are more likely, may account for the unexpected cytoplasmic male sterility effects of selfer M . parishii organelles in hybrids with M . lewisii . Overall, our phylogenomic results reveal extensive reticulation throughout the evolutionary history of a classic monkeyflower radiation, suggesting that natural selection (re-)assembles and maintains species-diagnostic traits and barriers in the face of gene flow. Our findings further underline the challenges, even in reproductively isolated species, in distinguishing re-use of adaptive alleles from true convergence and emphasize the value of a phylogenomic framework for reconstructing the evolutionary genetics of adaptation and speciation. 
    more » « less
  4. Abstract Background

    The Aldabra giant tortoise (Aldabrachelys gigantea) is one of only two giant tortoise species left in the world. The species is endemic to Aldabra Atoll in Seychelles and is listed as Vulnerable on the International Union for Conservation of Nature Red List (v2.3) due to its limited distribution and threats posed by climate change. Genomic resources for A. gigantea are lacking, hampering conservation efforts for both wild and ex situpopulations. A high-quality genome would also open avenues to investigate the genetic basis of the species’ exceptionally long life span.


    We produced the first chromosome-level de novo genome assembly of A. gigantea using PacBio High-Fidelity sequencing and high-throughput chromosome conformation capture. We produced a 2.37-Gbp assembly with a scaffold N50 of 148.6 Mbp and a resolution into 26 chromosomes. RNA sequencing–assisted gene model prediction identified 23,953 protein-coding genes and 1.1 Gbp of repetitive sequences. Synteny analyses among turtle genomes revealed high levels of chromosomal collinearity even among distantly related taxa. To assess the utility of the high-quality assembly for species conservation, we performed a low-coverage resequencing of 30 individuals from wild populations and two zoo individuals. Our genome-wide population structure analyses detected genetic population structure in the wild and identified the most likely origin of the zoo-housed individuals. We further identified putatively deleterious mutations to be monitored.


    We establish a high-quality chromosome-level reference genome for A. gigantea and one of the most complete turtle genomes available. We show that low-coverage whole-genome resequencing, for which alignment to the reference genome is a necessity, is a powerful tool to assess the population structure of the wild population and reveal the geographic origins of ex situ individuals relevant for genetic diversity management and rewilding efforts.

    more » « less
  5. Premise

    Apocynaceae is the 10th largest flowering plant family and a focus for study of plant–insect interactions, especially as mediated by secondary metabolites. However, it has few genomic resources relative to its size. Target capture sequencing is a powerful approach for genome reduction that facilitates studies requiring data from the nuclear genome in non‐model taxa, such as Apocynaceae.


    Transcriptomes were used to design probes for targeted sequencing of putatively single‐copy nuclear genes across Apocynaceae. The sequences obtained were used to assess the success of the probe design, the intrageneric and intraspecific variation in the targeted genes, and the utility of the genes for inferring phylogeny.


    From 853 candidate nuclear genes, 835 were consistently recovered in single copy and were variable enough for phylogenomics. The inferred gene trees were useful for coalescent‐based species tree analysis, which showed all subfamilies of Apocynaceae as monophyletic, while also resolving relationships among species within the genusApocynum. Intraspecific comparison ofElytropus chilensisindividuals revealed numerous single‐nucleotide polymorphisms with potential for use in population‐level studies.


    Community use of this Hyb‐Seq probe set will facilitate and promote progress in the study of Apocynaceae across scales from population genomics to phylogenomics.

    more » « less