skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Of traits and trees: probabilistic distances under continuous trait models for dissecting the interplay among phylogeny, model, and data
Abstract Stochastic models of character trait evolution have become a cornerstone of evolutionary biology in an array of contexts. While probabilistic models have been used extensively for statistical inference, they have largely been ignored for the purpose of measuring distances between phylogeny-aware models. Recent contributions to the problem of phylogenetic distance computation have highlighted the importance of explicitly considering evolutionary model parameters and their impacts on molecular sequence data when quantifying dissimilarity between trees. By comparing two phylogenies in terms of their induced probability distributions that are functions of many model parameters, these distances can be more informative than traditional approaches that rely strictly on differences in topology or branch lengths alone. Currently, however, these approaches are designed for comparing models of nucleotide substitution and gene tree distributions, and thus, are unable to address other classes of traits and associated models that may be of interest to evolutionary biologists. Here we expand the principles of probabilistic phylogenetic distances to compute tree distances under models of continuous trait evolution along a phylogeny. By explicitly considering both the degree of relatedness among species and the evolutionary processes that collectively give rise to character traits, these distances provide a foundation for comparing models and their predictions, and for quantifying the impacts of assuming one phylogenetic background over another while studying the evolution of a particular trait. We demonstrate the properties of these approaches using theory, simulations, and several empirical datasets that highlight potential uses of probabilistic distances in many scenarios. We also introduce an open-source R package named PRDATR for easy application by the scientific community for computing phylogenetic distances under models of character trait evolution.  more » « less
Award ID(s):
1949268 2001063
PAR ID:
10213824
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Systematic Biology
ISSN:
1063-5157
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The evolution of molecular and phenotypic traits is commonly modelled using Markov processes along a phylogeny. This phylogeny can be a tree, or a network if it includes reticulations, representing events such as hybridization or admixture. Computing the likelihood of data observed at the leaves is costly as the size and complexity of the phylogeny grows. Efficient algorithms exist for trees, but cannot be applied to networks. We show that a vast array of models for trait evolution along phylogenetic networks can be reformulated as graphical models, for which efficient belief propagation algorithms exist. We provide a brief review of belief propagation on general graphical models, then focus on linear Gaussian models for continuous traits. We show how belief propagation techniques can be applied for exact or approximate (but more scalable) likelihood and gradient calculations, and prove novel results for efficient parameter inference of some models. We highlight the possible fruitful interactions between graphical models and phylogenetic methods. For example, approximate likelihood approaches have the potential to greatly reduce computational costs for phylogenies with reticulations. This article is part of the theme issue ‘“A mathematical theory of evolution”: phylogenetic models dating back 100 years’. 
    more » « less
  2. Within-species trait variation may be the result of genetic variation, environmental variation, or measurement error, for example. In phylogenetic comparative studies, failing to account for within-species variation has many adverse effects, such as increased error in testing hypotheses about evolutionary correlations, biased estimates of evolutionary rates, and inaccurate inference of the mode of evolution. These adverse effects were demonstrated in studies that considered a tree-like underlying phylogeny. Comparative methods on phylogenetic networks are still in their infancy. The impact of within-species variation on network-based methods has not been studied. Here, we introduce a phylogenetic linear model in which the phylogeny can be a network to account for within-species variation in the continuous response trait assuming equal within-species variances across species. We show how inference based on the individual values can be reduced to a problem using species-level summaries, even when the within-species variance is estimated. Our method performs well under various simulation settings and is robust when within-species variances are unequal across species. When phenotypic (within-species) correlations differ from evolutionary (between-species) correlations, estimates of evolutionary coefficients are pulled towards the phenotypic coefficients for all methods we tested. Also, evolutionary rates are either underestimated or overestimated, depending on the mismatch between phenotypic and evolutionary relationships. We applied our method to morphological and geographical data from Polemonium. We find a strong negative correlation of leaflet size with elevation, despite a positive correlation within species. Our method can explore the role of gene flow in trait evolution by comparing the fit of a network to that of a tree. We find marginal evidence for leaflet size being affected by gene flow and support for previous observations on the challenges of using individual continuous traits to infer inheritance weights at reticulations. Our method is freely available in the Julia package PhyloNetworks. 
    more » « less
  3. Phylogenetic comparative methods have long been a mainstay of evolutionary biology, allowing for the study of trait evolution across species while accounting for their common ancestry. These analyses typically assume a single, bifurcating phylogenetic tree describing the shared history among species. However, modern phylogenomic analyses have shown that genomes are often composed of mosaic histories that can disagree both with the species tree and with each other—so-called discordant gene trees. These gene trees describe shared histories that are not captured by the species tree, and therefore that are unaccounted for in classic comparative approaches. The application of standard comparative methods to species histories containing discordance leads to incorrect inferences about the timing, direction, and rate of evolution. Here, we develop two approaches for incorporating gene tree histories into comparative methods: one that constructs an updated phylogenetic variance–covariance matrix from gene trees, and another that applies Felsenstein's pruning algorithm over a set of gene trees to calculate trait histories and likelihoods. Using simulation, we demonstrate that our approaches generate much more accurate estimates of tree-wide rates of trait evolution than standard methods. We apply our methods to two clades of the wild tomato genusSolanumwith varying rates of discordance, demonstrating the contribution of gene tree discordance to variation in a set of floral traits. Our approaches have the potential to be applied to a broad range of classic inference problems in phylogenetics, including ancestral state reconstruction and the inference of lineage-specific rate shifts. 
    more » « less
  4. Abstract Traits that have arisen multiple times yet still remain rare present a curious paradox. A number of these rare traits show a distinct tippy pattern, where they appear widely dispersed across a phylogeny, are associated with short branches and differ between recently diverged sister species. This phylogenetic pattern has classically been attributed to the trait being an evolutionary dead end, where the trait arises due to some short‐term evolutionary advantage, but it ultimately leads species to extinction. While the higher extinction rate associated with a dead end trait could produce such a tippy pattern, a similar pattern could appear if lineages with the trait speciated slower than other lineages, or if the trait was lost more often that it was gained. In this study, we quantify the degree of tippiness of red flowers in the tomato family, Solanaceae, and investigate the macroevolutionary processes that could explain the sparse phylogenetic distribution of this trait. Using a suite of metrics, we confirm that red‐flowered lineages are significantly overdispersed across the tree and form smaller clades than expected under a null model. Next, we fit 22 alternative models using HiSSE(Hidden State Speciation and Extinction), which accommodates asymmetries in speciation, extinction and transition rates that depend on observed and unobserved (hidden) character states. Results of the model fitting indicated significant variation in diversification rates across the family, which is best explained by the inclusion of hidden states. Our best fitting model differs between the maximum clade credibility tree and when incorporating phylogenetic uncertainty, suggesting that the extreme tippiness and rarity of red Solanaceae flowers makes it difficult to distinguish among different underlying processes. However, both of the best models strongly support a bias towards the loss of red flowers. The best fitting HiSSEmodel when incorporating phylogenetic uncertainty lends some support to the hypothesis that lineages with red flowers exhibit reduced diversification rates due to elevated extinction rates. Future studies employing simulations or targeting population‐level processes may allow us to determine whether red flowers in Solanaceae or other angiosperms clades are rare and tippy due to a combination of processes, or asymmetrical transitions alone. 
    more » « less
  5. Zhou, Xuming (Ed.)
    Abstract Comparative genomics approaches seek to associate molecular evolution with the evolution of phenotypes across a phylogeny. Many of these methods lack the ability to analyze non-ordinal categorical traits with more than two categories. To address this limitation, we introduce an expansion to RERconverge that associates shifts in evolutionary rates with the convergent evolution of categorical traits. The categorical RERconverge expansion includes methods for performing categorical ancestral state reconstruction, statistical tests for associating relative evolutionary rates with categorical variables, and a new method for performing phylogeny-aware permutations, “permulations”, on categorical traits. We demonstrate our new method on a three-category diet phenotype, and we compare its performance to binary RERconverge analyses and two existing methods for comparative genomic analyses of categorical traits: phylogenetic simulations and a phylogenetic signal based method. We present an analysis of how the categorical permulations scale with the number of species and the number of categories included in the analysis. Our results show that our new categorical method outperforms phylogenetic simulations at identifying genes and enriched pathways significantly associated with the diet phenotypes and that the categorical ancestral state reconstruction drives an improvement in our ability to capture diet-related enriched pathways compared to binary RERconverge when implemented without user input on phenotype evolution. The categorical expansion to RERconverge will provide a strong foundation for applying the comparative method to categorical traits on larger data sets with more species and more complex trait evolution than have previously been analyzed. 
    more » « less