skip to main content


Title: Target Capture Methods Offer Insight into the Evolution of Rapidly Diverged Taxa and Resolve Allopolyploid Homeologs in the Fern Genus Polypodium s.s.
Abstract— Like many fern lineages comprising reticulate species complexes, Polypodium s.s. (Polypodiacaeae) has a history shaped by rapid diversification, hybridization, and polyploidy that poses substantial challenges for phylogenetic inference with plastid and single-locus nuclear markers. Using target capture probes for 408 nuclear loci developed by the GoFlag project and a custom bioinformatic pipeline, SORTER, we constructed multi-locus nuclear datasets for diploid temperate and Mesoamerican species of Polypodium and five allotetraploid species belonging to the well-studied Polypodium vulgare complex. SORTER employs a clustering approach to separate putatively paralogous copies of targeted loci into orthologous matrices and haplotype phasing to infer allopolyploid haplotypes across loci, resulting in datasets amenable to both concatenated maximum likelihood and multi-species coalescent phylogenetic analyses. By comparing phylogenies derived from maximum likelihood and multi-species coalescent analyses of unphased and phased datasets, as well as evaluating discordance among gene trees and species trees, we recover support for incomplete lineage sorting within Polypodium s.s., novel relationships among diploid taxa of the Polypodium vulgare complex and its Mesoamerican sister clade, and the placement of several Polypodium species within other genera. Additionally, we were able to infer well-supported phylogenies that identified the hypothesized progenitors of the allotetraploid species, indicating that SORTER is an effective and accurate tool for reconstructing homeolog haplotypes of allopolyploids in fern taxa and other non-model organisms from target capture data.  more » « less
Award ID(s):
1920858
NSF-PAR ID:
10414242
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Systematic Botany
Volume:
48
Issue:
1
ISSN:
0363-6445
Page Range / eLocation ID:
96 to 109
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    A phylogenomic analysis of the so far phylogenetically unresolved subfamily Bromelioideae (Bromeliaceae) was performed to infer species relationships as the basis for future taxonomic treatment, stabilization of generic concept, and further analyses of evolution and biogeography of the subfamily. A target‐enrichment approach was chosen, using the Angiosperms353 v.4 kit RNA‐baits and including 86 Bromelioideae species representing previously identified major evolutionary lineages. Phylogenetic analyses were based on 125 target nuclear loci, assembled off‐target plastome as well as mitogenome reads. A Bromelioideae phylogeny with a mostly well‐resolved backbone is provided based on nuclear (194 kbp), plastome (109 kbp), and mitogenome data (34 kbp). For the nuclear markers, a coalescent‐based analysis of single‐locus gene trees was performed as well as a supermatrix analysis of concatenated gene alignments. Nuclear and plastome datasets provide well‐resolved trees, which showed only minor topological incongruences. The mitogenome tree is not sufficiently resolved. A total of 26 well‐supported clades were identified. The generaAechmea,Canistrum,Hohenbergia,Neoregelia, andQuesneliawere revealed polyphyletic. In core Bromelioideae,Acanthostachysis sister to the remainder. Among the 26 recognized clades, 12 correspond with currently employed taxonomic concepts. Hence, the presented phylogenetic framework will serve as an important basis for future taxonomic revisions as well as to better understand the evolutionary drivers and processes in this exciting subfamily.

     
    more » « less
  2. Premise

    Phylogenetic relationships within major angiosperm clades are increasingly well resolved, but largely informed by plastid data. Areas of poor resolution persist within the Dipsacales, including placement ofHeptacodiumandZabelia, and relationships within the Caprifolieae and Linnaeeae, hindering our interpretation of morphological evolution. Here, we sampled a significant number of nuclear loci using a Hyb‐Seq approach and used these data to infer the Dipsacales phylogeny and estimate divergence times.

    Methods

    Sampling all major clades within the Dipsacales, we applied the Angiosperms353 probe set to 96 species. Data were filtered based on locus completeness and taxon recovery per locus, and trees were inferred using RAxML and ASTRAL. Plastid loci were assembled from off‐target reads, and 10 fossils were used to calibrate dated trees.

    Results

    Varying numbers of targeted loci and off‐target plastomes were recovered from most taxa. Nuclear and plastid data confidently placeHeptacodiumwith Caprifolieae, implying homoplasy in calyx morphology, ovary development, and fruit type. Placement ofZabelia, and relationships within the Caprifolieae and Linnaeeae, remain uncertain. Dipsacales diversification began earlier than suggested by previous angiosperm‐wide dating analyses, but many major splitting events date to the Eocene.

    Conclusions

    The Angiosperms353 probe set facilitated the assembly of a large, single‐copy nuclear dataset for the Dipsacales. Nevertheless, many relationships remain unresolved, and resolution was poor for woody clades with low rates of molecular evolution. We favor expanding the Angiosperms353 probe set to include more variable loci and loci of special interest, such as developmental genes, within particular clades.

     
    more » « less
  3. Abstract

    Contamination of a genetic sample with DNA from one or more nontarget species is a continuing concern of molecular phylogenetic studies, both Sanger sequencing studies and next-generation sequencing studies. We developed an automated pipeline for identifying and excluding likely cross-contaminated loci based on the detection of bimodal distributions of patristic distances across gene trees. When contamination occurs between samples within a data set, a comparison between a contaminated sample and its contaminant taxon will yield bimodal distributions with one peak close to zero patristic distance. This new method does not rely on a priori knowledge of taxon relatedness nor does it determine the causes(s) of the contamination. Exclusion of putatively contaminated loci from a data set generated for the insect family Cicadidae showed that these sequences were affecting some topological patterns and branch supports, although the effects were sometimes subtle, with some contamination-influenced relationships exhibiting strong bootstrap support. Long tip branches and outlier values for one anchored phylogenomic pipeline statistic (AvgNHomologs) were correlated with the presence of contamination. While the anchored hybrid enrichment markers used here, which target hemipteroid taxa, proved effective in resolving deep and shallow level Cicadidae relationships in aggregate, individual markers contained inadequate phylogenetic signal, in part probably due to short length. The cleaned data set, consisting of 429 loci, from 90 genera representing 44 of 56 current Cicadidae tribes, supported three of the four sampled Cicadidae subfamilies in concatenated-matrix maximum likelihood (ML) and multispecies coalescent-based species tree analyses, with the fourth subfamily weakly supported in the ML trees. No well-supported patterns from previous family-level Sanger sequencing studies of Cicadidae phylogeny were contradicted. One taxon (Aragualna plenalinea) did not fall with its current subfamily in the genetic tree, and this genus and its tribe Aragualnini is reclassified to Tibicininae following morphological re-examination. Only subtle differences were observed in trees after the removal of loci for which divergent base frequencies were detected. Greater success may be achieved by increased taxon sampling and developing a probe set targeting a more recent common ancestor and longer loci. Searches for contamination are an essential step in phylogenomic analyses of all kinds and our pipeline is an effective solution. [Auchenorrhyncha; base-composition bias; Cicadidae; Cicadoidea; Hemiptera; phylogenetic conflict.]

     
    more » « less
  4. Abstract

    Next‐generation sequencing technologies (NGS) allow systematists to amass a wealth of genomic data from non‐model species for phylogenetic resolution at various temporal scales. However, phylogenetic inference for many lineages dominated by non‐model species has not yet benefited from NGS, which can complement Sanger sequencing studies. One such lineage, whose phylogenetic relationships remain uncertain, is the diverse, agriculturally important and charismatic Coreoidea (Hemiptera: Heteroptera). Given the lack of consensus on higher‐level relationships and the importance of a robust phylogeny for evolutionary hypothesis testing, we use a large data set comprised of hundreds of ultraconserved element (UCE) loci to infer the phylogeny of Coreoidea (excluding Stenocephalidae and Hyocephalidae), with emphasis on the families Coreidae and Alydidae. We generated three data sets by including alignments that contained loci sampled for at least 50%, 60%, or 70% of the total taxa, and inferred phylogeny using maximum likelihood and summary coalescent methods. Twenty‐six external morphological features used in relatively comprehensive phylogenetic analyses of coreoids were also re‐evaluated within our molecular phylogenetic framework. We recovered 439–970 loci per species (16%–36% of loci targeted) and combined this with previously generated UCE data for 12 taxa. All data sets, regardless of analytical approach, yielded topologically similar and strongly supported trees, with the exception of outgroup relationships and the position of Hydarinae. We recovered a monophyletic Coreoidea, with Rhopalidae highly supported as the sister group to Alydidae + Coreidae. Neither Alydidae nor Coreidae were monophyletic; the coreid subfamilies Hydarinae and Pseudophloeinae were recovered as more closely related to Alydidae than to other coreid subfamilies. Coreinae were paraphyletic with respect to Meropachyinae. Most morphological traits were homoplastic with several clades defined by few, if any, synapomorphies. Our results demonstrate the utility of phylogenomic approaches in generating robust hypotheses for taxa with long‐standing phylogenetic problems and highlight that novel insights may come from such approaches.

     
    more » « less
  5. Large systematic revisionary projects incorporating data for hundreds or thousands of taxa require an integrative approach, with a strong biodiversity-informatics core for efficient data management to facilitate research on the group. Our original biodiversity informatics platform, 3i (Internet-accessible Interactive Identification) combined a customized MS Access database backend with ASP-based web interfaces to support revisionary syntheses of several large genera of leafhopers (Hemiptera: Auchenorrhyncha: Cicadellidae). More recently, for our National Science Foundation sponsored project, “GoLife: Collaborative Research: Integrative genealogy, ecology and phenomics of deltocephaline leafhoppers (Hemiptera: Cicadellidae), and their microbial associates”, we selected the new open-source platform TaxonWorks as the cyberinfrastructure. In the scope of the project, the original “3i World Auchenorrhyncha Database” was imported into TaxonWorks. At the present time, TaxonWorks has many tools to automatically import nomenclature, citations, and specimen based collection data. At the time of the initial migration of the 3i database, many of those tools were still under development, and complexity of the data in the database required a custom migration script, which is still probably the most efficient solution for importing datasets with long development history. At the moment, the World Auchenorrhyncha Database comprehensively covers nomenclature of the group and includes data on 70 valid families, 6,816 valid genera, 47,064 valid species as well as synonymy and subsequent combinations (Fig. 1). In addition, many taxon records include the original citation, bibliography, type information, etymology, etc. The bibliography of the group includes 37,579 sources, about 1/3 of which are associated with PDF files. Species have distribution records, either derived from individual specimens or as country and state level asserted distribution, as well as biological associations indicating host plants, predators, and parasitoids. Observation matrices in TaxonWorks are designed to handle morphological data associated with taxa or specimens. The matrices may be used to automatically generate interactive identification keys and taxon descriptions. They can also be downloaded to be imported, for example, into Lucid builder, or to perform phylogenetic analysis using an external application. At the moment there are 36 matrices associated with the project. The observation matrix from GoLife project covers 798 taxa by 210 descriptors (most of which are qualitative multi-state morphological descriptors) (Fig. 2). Illustrations are provided for 9,886 taxa and organized in the specialized image matrix and could be used as a pictorial key for determination of species and taxa of a higher rank. For the phylogenetic analysis, a dataset was constructed for 730 terminal taxa and >160,000 nucleotide positions obtained using anchored hybrid enrichment of genomic DNA for a sample of leafhoppers from the subfamily Deltocephalinae and outgroups. The probe kit targets leafhopper genes, as well as some bacterial genes (endosymbionts and plant pathogens transmitted by leafhoppers). The maximum likelihood analyses of concatenated nucleotide and amino acid sequences as well as coalescent gene tree analysis yielded well-resolved phylogenetic trees (Cao et al. 2022). Raw sequence data have been uploaded to the Sequence Read Archive on GenBank. Occurrence and morphological data, as well as diagnostic images, for voucher specimens have been incorporated into TaxonWorks. Data in TaxonWorks could be exported in raw format, get accessed via Application Programming Interface (API), or be shared with external data aggregators like Catalogue of Life, GBIF, iDigBio. 
    more » « less