skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Comparing Likelihood Ratios to Understand Genome-Wide Variation in Phylogenetic Support
Abstract Genomic data have only sometimes brought resolution to the tree of life. Large phylogenomic studies can reach conflicting conclusions about important relationships, with mutually exclusive hypotheses receiving strong support. Reconciling such differences requires a detailed understanding of how phylogenetic signal varies among data sets. Two complementary strategies for better understanding phylogenomic conflicts are to examine support on a locus-by-locus basis and use support values that capture a larger range of variation in phylogenetic information, such as likelihood ratios. Likelihood ratios can be calculated using either maximum or marginal likelihoods. Despite being conceptually similar, differences in how these ratios are calculated and interpreted have not been closely examined in phylogenomics. Here, we compare the behavior of maximum and marginal likelihood ratios when evaluating alternate resolutions of recalcitrant relationships among major squamate lineages. We find that these ratios are broadly correlated between loci, but the correlation is driven by extreme values. As a consequence, the proportion of loci that support a hypothesis can change depending on which ratio is used and whether smaller values are discarded. In addition, maximum likelihood ratios frequently exhibit identical support for alternate hypotheses, making conflict resolution a challenge. We find surprising support for a sister relationship between snakes and iguanians across four different phylogenomic data sets in contrast to previous empirical studies. [Bayes factors; likelihood ratios; marginal likelihood; maximum likelihood; phylogenomics; squamates.]  more » « less
Award ID(s):
1950759 1950954
PAR ID:
10402540
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Systematic Biology
Volume:
71
Issue:
4
ISSN:
1063-5157
Format(s):
Medium: X Size: p. 973-985
Size(s):
p. 973-985
Sponsoring Org:
National Science Foundation
More Like this
  1. Ruane, Sara (Ed.)
    Abstract Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; target capture; ultraconserved elements (UCE).] 
    more » « less
  2. PremiseCornales is an order of flowering plants containing ecologically and horticulturally important families, including Cornaceae (dogwoods) and Hydrangeaceae (hydrangeas), among others. While many relationships in Cornales are strongly supported by previous studies, some uncertainty remains with regards to the placement of Hydrostachyaceae and to relationships among families in Cornales and within Cornaceae. Here we analyzed hundreds of nuclear loci to test published phylogenetic hypotheses and estimated a robust species tree for Cornales. MethodsUsing the Angiosperms353 probe set and existing data sets, we generated phylogenomic data for 158 samples, representing all families in the Cornales, with intensive sampling in the Cornaceae. ResultsWe curated an average of 312 genes per sample, constructed maximum likelihood gene trees, and inferred a species tree using the summary approach implemented in ASTRAL‐III, a method statistically consistent with the multispecies coalescent model. ConclusionsThe species tree we constructed generally shows high support values and a high degree of concordance among individual nuclear gene trees. Relationships among families are largely congruent with previous molecular studies, except for the placement of the nyssoids and the Grubbiaceae‐Curtisiaceae clades. Furthermore, we were able to place Hydrostachyaceae within Cornales, and within Cornaceae, the monophyly of known morphogroups was well supported. However, patterns of gene tree discordance suggest potential ancient reticulation, gene flow, and/or ILS in the Hydrostachyaceae lineage and the early diversification ofCornus. Our findings reveal new insights into the diversification process across Cornales and demonstrate the utility of the Angiosperms353 probe set. 
    more » « less
  3. Yoshizawa, Kazunori (Ed.)
    Abstract The order Psocodea includes the two historically recognized groups Psocoptera (free-living bark lice) and Phthiraptera (parasitic lice) that were once considered separate orders. Psocodea is divided in three suborders: Trogiomorpha, Troctomorpha, and Psocomorpha, the latter being the largest within the free-living groups. Despite the increasing number of transcriptomes and whole genome sequence (WGS) data available for this group, the relationships among the six known infraorders within Psocomorpha remain unclear. Here, we evaluated the utility of a bait set designed specifically for parasitic lice belonging to suborder Troctomorpha to extract UCE loci from transcriptome and WGS data of 55 bark louse species and explored the phylogenetic relationships within Psocomorpha using these UCE loci markers. Taxon sampling was heavily focused on the families Lachesillidae and Elipsocidae, whose relationships have been problematic in prior phylogenetic studies. We successfully recovered a total of 2,622 UCE loci, with a 40% completeness matrix containing 2,081 UCE loci and an 80% completeness matrix containing 178 UCE loci. The average number of UCE loci recovered for the 55 species was 1,401. The WGS data sets produced a larger number of UCE loci (1,495) on average than the transcriptome data sets (972). Phylogenetic relationships reconstructed with Maximum Likelihood and coalescent-based analysis were concordant regarding the paraphyly of Lachesillidae and Elipsocidae. Branch support values were generally lower in analyses that used a fewer number of loci, even though they had higher matrix completeness. 
    more » « less
  4. Satta, Yoko (Ed.)
    Abstract Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies. 
    more » « less
  5. Pupko, Tal (Ed.)
    Abstract The clade Pancrustacea, comprising crustaceans and hexapods, is the most diverse group of animals on earth, containing over 80% of animal species and half of animal biomass. It has been the subject of several recent phylogenomic analyses, yet relationships within Pancrustacea show a notable lack of stability. Here, the phylogeny is estimated with expanded taxon sampling, particularly of malacostracans. We show small changes in taxon sampling have large impacts on phylogenetic estimation. By analyzing identical orthologs between two slightly different taxon sets, we show that the differences in the resulting topologies are due primarily to the effects of taxon sampling on the phylogenetic reconstruction method. We compare trees resulting from our phylogenomic analyses with those from the literature to explore the large tree space of pancrustacean phylogenetic hypotheses and find that statistical topology tests reject the previously published trees in favor of the maximum likelihood trees produced here. Our results reject several clades including Caridoida, Eucarida, Multicrustacea, Vericrustacea, and Syncarida. Notably, we find Copepoda nested within Allotriocarida with high support and recover a novel relationship between decapods, euphausiids, and syncarids that we refer to as the Syneucarida. With denser taxon sampling, we find Stomatopoda sister to this latter clade, which we collectively name Stomatocarida, dividing Malacostraca into three clades: Leptostraca, Peracarida, and Stomatocarida. A new Bayesian divergence time estimation is conducted using 13 vetted fossils. We review our results in the context of other pancrustacean phylogenetic hypotheses and highlight 15 key taxa to sample in future studies. 
    more » « less