skip to main content

Title: Congruence and Conflict in the Higher-Level Phylogenetics of Squamate Reptiles: An Expanded Phylogenomic Perspective
Abstract Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; more » target capture; ultraconserved elements (UCE).] « less
Authors:
; ; ; ; ; ; ; ;
Editors:
Ruane, Sara
Award ID(s):
1754398
Publication Date:
NSF-PAR ID:
10250324
Journal Name:
Systematic Biology
Volume:
70
Issue:
3
Page Range or eLocation-ID:
542 to 557
ISSN:
1063-5157
Sponsoring Org:
National Science Foundation
More Like this
  1. Genome-scale data have greatly facilitated the resolution of recalcitrant nodes that Sanger-based datasets have been unable to resolve. However, phylogenomic studies continue to use traditional methods such as bootstrapping to estimate branch support; and high bootstrap values are still interpreted as providing strong support for the correct topology. Furthermore, relatively little attention has been given to assessing discordances between gene and species trees, and the underlying processes that produce phylogenetic conflict. We generated novel genomic datasets to characterize and determine the causes of discordance in Old World treefrogs (Family: Rhacophoridae)—a group that is fraught with conflicting and poorly supported topologies among major clades. Additionally, a suite of data filtering strategies and analytical methods were applied to assess their impact on phylogenetic inference. We showed that incomplete lineage sorting was detected at all nodes that exhibited high levels of discordance. Those nodes were also associated with extremely short internal branches. We also clearly demonstrate that bootstrap values do not reflect uncertainty or confidence for the correct topology and, hence, should not be used as a measure of branch support in phylogenomic datasets. Overall, we showed that phylogenetic discordances in Old World treefrogs resulted from incomplete lineage sorting and that species treemore »inference can be improved using a multi-faceted, total-evidence approach, which uses the most amount of data and considers results from different analytical methods and datasets.« less
  2. Abstract

    Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within data sets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96$\%$ of the clade’s species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of the genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower-quality samples. Most instances of topological conflict and nonmonophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, the noisemore »was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many data sets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology. [Historical DNA; machine learning; museomics; Psittaciformes; species tree.]

    « less
  3. Abstract

    To examine phylogenetic heterogeneity in turtle evolution, we collected thousands of high-confidence single-copy orthologs from 19 genome assemblies representative of extant turtle diversity and estimated a phylogeny with multispecies coalescent and concatenated partitioned methods. We also collected next-generation sequences from 26 turtle species and assembled millions of biallelic markers to reconstruct phylogenies based on annotated regions from the western painted turtle (Chrysemys picta bellii) genome (coding regions, introns, untranslated regions, intergenic, and others). We then measured gene tree-species tree discordance, as well as gene and site heterogeneity at each node in the inferred trees, and tested for temporal patterns in phylogenomic conflict across turtle evolution. We found strong and consistent support for all bifurcations in the inferred turtle species phylogenies. However, a number of genes, sites, and genomic features supported alternate relationships between turtle taxa. Our results suggest that gene tree-species tree discordance in these data sets is likely driven by population-level processes such as incomplete lineage sorting. We found very little effect of substitutional saturation on species tree topologies, and no clear phylogenetic patterns in codon usage bias and compositional heterogeneity. There was no correlation between gene and site concordance, node age, and DNA substitution rate across mostmore »annotated genomic regions. Our study demonstrates that heterogeneity is to be expected even in well-resolved clades such as turtles, and that future phylogenomic studies should aim to sample as much of the genome as possible in order to obtain accurate phylogenies for assessing conservation priorities in turtles. [Discordance; genomes; phylogeny; turtles.]

    « less
  4. Kubatko, Laura (Ed.)
    Abstract Many recent phylogenetic methods have focused on accurately inferring species trees when there is gene tree discordance due to incomplete lineage sorting (ILS). For almost all of these methods, and for phylogenetic methods in general, the data for each locus are assumed to consist of orthologous, single-copy sequences. Loci that are present in more than a single copy in any of the studied genomes are excluded from the data. These steps greatly reduce the number of loci available for analysis. The question we seek to answer in this study is: what happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two large biological data sets, we show that running such methods on data with paralogs can still provide accurate results. We use multiple different methods, some of which are based directly on the multispecies coalescent model, and some of which have been proven to be statistically consistent under it. We also treat the paralogous loci in multiple ways: from explicitly denoting them as paralogs, to randomly selecting one copy per species. In all cases, the inferred species trees aremore »as accurate as equivalent analyses using single-copy orthologs. Our results have significant implications for the use of ILS-aware phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci. This will greatly increase the amount of data that can be used for phylogenetic inference.[Gene duplication and loss; incomplete lineage sorting; multispecies coalescent; orthology; paralogy.]« less
  5. Jermiin, Lars (Ed.)
    Abstract Genomic data have only sometimes brought resolution to the tree of life. Large phylogenomic studies can reach conflicting conclusions about important relationships, with mutually exclusive hypotheses receiving strong support. Reconciling such differences requires a detailed understanding of how phylogenetic signal varies among data sets. Two complementary strategies for better understanding phylogenomic conflicts are to examine support on a locus-by-locus basis and use support values that capture a larger range of variation in phylogenetic information, such as likelihood ratios. Likelihood ratios can be calculated using either maximum or marginal likelihoods. Despite being conceptually similar, differences in how these ratios are calculated and interpreted have not been closely examined in phylogenomics. Here, we compare the behavior of maximum and marginal likelihood ratios when evaluating alternate resolutions of recalcitrant relationships among major squamate lineages. We find that these ratios are broadly correlated between loci, but the correlation is driven by extreme values. As a consequence, the proportion of loci that support a hypothesis can change depending on which ratio is used and whether smaller values are discarded. In addition, maximum likelihood ratios frequently exhibit identical support for alternate hypotheses, making conflict resolution a challenge. We find surprising support for a sister relationshipmore »between snakes and iguanians across four different phylogenomic data sets in contrast to previous empirical studies. [Bayes factors; likelihood ratios; marginal likelihood; maximum likelihood; phylogenomics; squamates.]« less