skip to main content


Title: Phylogenomic Analysis of the Parrots of the World Distinguishes Artifactual from Biological Sources of Gene Tree Discordance
Abstract

Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within data sets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96% of the clade’s species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of the genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower-quality samples. Most instances of topological conflict and nonmonophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, the noise was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many data sets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology. [Historical DNA; machine learning; museomics; Psittaciformes; species tree.]

 
more » « less
Award ID(s):
1655736 2029955
NSF-PAR ID:
10414545
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Systematic Biology
Volume:
72
Issue:
1
ISSN:
1063-5157
Page Range / eLocation ID:
p. 228-241
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ruane, Sara (Ed.)
    Abstract Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; target capture; ultraconserved elements (UCE).] 
    more » « less
  2. Ruane, Sara (Ed.)
    Abstract Some phylogenetic problems remain unresolved even when large amounts of sequence data are analyzed and methods that accommodate processes such as incomplete lineage sorting are employed. In addition to investigating biological sources of phylogenetic incongruence, it is also important to reduce noise in the phylogenomic dataset by using appropriate filtering approach that addresses gene tree estimation errors. We present the results of a case study in manakins, focusing on the very difficult clade comprising the genera Antilophia and Chiroxiphia. Previous studies suggest that Antilophia is nested within Chiroxiphia, though relationships among Antilophia+Chiroxiphia species have been highly unstable. We extracted more than 11,000 loci (ultra-conserved elements and introns) from whole genomes and conducted analyses using concatenation and multispecies coalescent methods. Topologies resulting from analyses using all loci differed depending on the data type and analytical method, with 2 clades (Antilophia+Chiroxiphia and Manacus+Pipra+Machaeopterus) in the manakin tree showing incongruent results. We hypothesized that gene trees that conflicted with a long coalescent branch (e.g., the branch uniting Antilophia+Chiroxiphia) might be enriched for cases of gene tree estimation error, so we conducted analyses that either constrained those gene trees to include monophyly of Antilophia+Chiroxiphia or excluded these loci. While constraining trees reduced some incongruence, excluding the trees led to completely congruent species trees, regardless of the data type or model of sequence evolution used. We found that a suite of gene metrics (most importantly the number of informative sites and likelihood of intralocus recombination) collectively explained the loci that resulted in non-monophyly of Antilophia+Chiroxiphia. We also found evidence for introgression that may have contributed to the discordant topologies we observe in Antilophia+Chiroxiphia and led to deviations from expectations given the multispecies coalescent model. Our study highlights the importance of identifying factors that can obscure phylogenetic signal when dealing with recalcitrant phylogenetic problems, such as gene tree estimation error, incomplete lineage sorting, and reticulation events. [Birds; c-gene; data type; gene estimation error; model fit; multispecies coalescent; phylogenomics; reticulation] 
    more » « less
  3. null (Ed.)
    Genome-scale data have greatly facilitated the resolution of recalcitrant nodes that Sanger-based datasets have been unable to resolve. However, phylogenomic studies continue to use traditional methods such as bootstrapping to estimate branch support; and high bootstrap values are still interpreted as providing strong support for the correct topology. Furthermore, relatively little attention has been given to assessing discordances between gene and species trees, and the underlying processes that produce phylogenetic conflict. We generated novel genomic datasets to characterize and determine the causes of discordance in Old World treefrogs (Family: Rhacophoridae)—a group that is fraught with conflicting and poorly supported topologies among major clades. Additionally, a suite of data filtering strategies and analytical methods were applied to assess their impact on phylogenetic inference. We showed that incomplete lineage sorting was detected at all nodes that exhibited high levels of discordance. Those nodes were also associated with extremely short internal branches. We also clearly demonstrate that bootstrap values do not reflect uncertainty or confidence for the correct topology and, hence, should not be used as a measure of branch support in phylogenomic datasets. Overall, we showed that phylogenetic discordances in Old World treefrogs resulted from incomplete lineage sorting and that species tree inference can be improved using a multi-faceted, total-evidence approach, which uses the most amount of data and considers results from different analytical methods and datasets. 
    more » « less
  4. Abstract

    To examine phylogenetic heterogeneity in turtle evolution, we collected thousands of high-confidence single-copy orthologs from 19 genome assemblies representative of extant turtle diversity and estimated a phylogeny with multispecies coalescent and concatenated partitioned methods. We also collected next-generation sequences from 26 turtle species and assembled millions of biallelic markers to reconstruct phylogenies based on annotated regions from the western painted turtle (Chrysemys picta bellii) genome (coding regions, introns, untranslated regions, intergenic, and others). We then measured gene tree-species tree discordance, as well as gene and site heterogeneity at each node in the inferred trees, and tested for temporal patterns in phylogenomic conflict across turtle evolution. We found strong and consistent support for all bifurcations in the inferred turtle species phylogenies. However, a number of genes, sites, and genomic features supported alternate relationships between turtle taxa. Our results suggest that gene tree-species tree discordance in these data sets is likely driven by population-level processes such as incomplete lineage sorting. We found very little effect of substitutional saturation on species tree topologies, and no clear phylogenetic patterns in codon usage bias and compositional heterogeneity. There was no correlation between gene and site concordance, node age, and DNA substitution rate across most annotated genomic regions. Our study demonstrates that heterogeneity is to be expected even in well-resolved clades such as turtles, and that future phylogenomic studies should aim to sample as much of the genome as possible in order to obtain accurate phylogenies for assessing conservation priorities in turtles. [Discordance; genomes; phylogeny; turtles.]

     
    more » « less
  5. Abstract

    Phylogenomic analysis of large genome-wide sequence data sets can resolve phylogenetic tree topologies for large species groups, help test the accuracy of and improve resolution for earlier multi-locus studies and reveal the level of agreement or concordance within partitions of the genome for various tree topologies. Here we used a target-capture approach to sequence 1088 single-copy exons for more than 200 labrid fishes together with more than 100 outgroup taxa to generate a new data-rich phylogeny for the family Labridae. Our time-calibrated phylogenetic analysis of exon-capture data pushes the root node age of the family Labridae back into the Cretaceous to about 79 Ma years ago. The monotypic Centrogenys vaigiensis, and the order Uranoscopiformes (stargazers) are identified as the sister lineages of Labridae. The phylogenetic relationships among major labrid subfamilies and within these clades were largely congruent with prior analyses of select mitochondrial and nuclear datasets. However, the position of the tribe Cirrhilabrini (fairy and flame wrasses) showed discordance, resolving either as the sister to a crown julidine clade or alternatively sister to a group formed by the labrines, cheilines and scarines. Exploration of this pattern using multiple approaches leads to slightly higher support for this latter hypothesis, highlighting the importance of genome-level data sets for resolving short internodes at key phylogenetic positions in a large, economically important groups of coral reef fishes. More broadly, we demonstrate how accounting for sources of biological variability from incomplete lineage sorting and exploring systematic error at conflicting nodes can aid in evaluating alternative phylogenetic hypotheses. [coral reefs; divergence time estimation; exon-capture; fossil calibration; incomplete lineage sorting.]

     
    more » « less