skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1950759

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Variation in gene tree estimates is widely observed in empirical phylogenomic data and is often assumed to be the result of biological processes. However, a recent study using tetrapod mitochondrial genomes to control for biological sources of variation due to their haploid, uniparentally inherited, and non-recombining nature found that levels of discordance among mitochondrial gene trees were comparable to those found in studies that assume only biological sources of variation. Additionally, they found that several of the models of sequence evolution chosen to infer gene trees were doing an inadequate job of fitting the sequence data. These results indicated that significant amounts of gene tree discordance in empirical data may be due to poor fit of sequence evolution models and that more complex and biologically realistic models may be needed. To test how the fit of sequence evolution models relates to gene tree discordance, we analyzed the same mitochondrial data sets as the previous study using 2 additional, more complex models of sequence evolution that each include a different biologically realistic aspect of the evolutionary process: A covarion model to incorporate site-specific rate variation across lineages (heterotachy), and a partitioned model to incorporate variable evolutionary patterns by codon position. Our results show that both additional models fit the data better than the models used in the previous study, with the covarion being consistently and strongly preferred as tree size increases. However, even these more preferred models still inferred highly discordant mitochondrial gene trees, thus deepening the mystery around what we label the “Mito-Phylo Paradox” and leading us to ask whether the observed variation could, in fact, be biological in nature after all. 
    more » « less
  2. Pupko, Tal (Ed.)
    Abstract Poor fit between models of sequence or trait evolution and empirical data is known to cause biases and lead to spurious conclusions about evolutionary patterns and processes. Bayesian posterior prediction is a flexible and intuitive approach for detecting such cases of poor fit. However, the expected behavior of posterior predictive tests has never been characterized for evolutionary models, which is critical for their proper interpretation. Here, we show that the expected distribution of posterior predictive P-values is generally not uniform, in contrast to frequentist P-values used for hypothesis testing, and extreme posterior predictive P-values often provide more evidence of poor fit than typically appreciated. Posterior prediction assesses model adequacy under highly favorable circumstances, because the model is fitted to the data, which leads to expected distributions that are often concentrated around intermediate values. Nonuniform expected distributions of P-values do not pose a problem for the application of these tests, however, and posterior predictive P-values can be interpreted as the posterior probability that the fitted model would predict a dataset with a test statistic value as extreme as the value calculated from the observed data. 
    more » « less
  3. Abstract Genomic data have only sometimes brought resolution to the tree of life. Large phylogenomic studies can reach conflicting conclusions about important relationships, with mutually exclusive hypotheses receiving strong support. Reconciling such differences requires a detailed understanding of how phylogenetic signal varies among data sets. Two complementary strategies for better understanding phylogenomic conflicts are to examine support on a locus-by-locus basis and use support values that capture a larger range of variation in phylogenetic information, such as likelihood ratios. Likelihood ratios can be calculated using either maximum or marginal likelihoods. Despite being conceptually similar, differences in how these ratios are calculated and interpreted have not been closely examined in phylogenomics. Here, we compare the behavior of maximum and marginal likelihood ratios when evaluating alternate resolutions of recalcitrant relationships among major squamate lineages. We find that these ratios are broadly correlated between loci, but the correlation is driven by extreme values. As a consequence, the proportion of loci that support a hypothesis can change depending on which ratio is used and whether smaller values are discarded. In addition, maximum likelihood ratios frequently exhibit identical support for alternate hypotheses, making conflict resolution a challenge. We find surprising support for a sister relationship between snakes and iguanians across four different phylogenomic data sets in contrast to previous empirical studies. [Bayes factors; likelihood ratios; marginal likelihood; maximum likelihood; phylogenomics; squamates.] 
    more » « less
  4. Carstens, Bryan (Ed.)
    Abstract The scale of data sets used to infer phylogenies has grown dramatically in the last decades, providing researchers with an enormous amount of information with which to draw inferences about evolutionary history. However, standard approaches to assessing confidence in those inferences (e.g., nonparametric bootstrap proportions [BP] and Bayesian posterior probabilities [PPs]) are still deeply influenced by statistical procedures and frameworks that were developed when information was much more limited. These approaches largely quantify uncertainty caused by limited amounts of data, which is often vanishingly small with modern, genome-scale sequence data sets. As a consequence, today’s phylogenomic studies routinely report near-complete confidence in their inferences, even when different studies reach strongly conflicting conclusions and the sites and loci in a single data set contain much more heterogeneity than our methods assume or can accommodate. Therefore, we argue that BPs and marginal PPs of bipartitions have outlived their utility as the primary means of measuring phylogenetic support for modern phylogenomic data sets with large numbers of sites relative to the number of taxa. Continuing to rely on these measures will hinder progress towards understanding remaining sources of uncertainty in the most challenging portions of the Tree of Life. Instead, we encourage researchers to examine the ideas and methods presented in this special issue of Systematic Biology and to explore the area further in their own work. The papers in this special issue outline strategies for assessing confidence and uncertainty in phylogenomic data sets that move beyond stochastic error due to limited data and offer promise for more productive dialogue about the challenges that we face in reaching our shared goal of understanding the history of life on Earth.[Big data; gene tree variation; genomic era; statistical bias.] 
    more » « less