skip to main content


Title: A variable-rate quantitative trait evolution model using penalized-likelihood
In recent years it has become increasingly popular to use phylogenetic comparative methods to investigate heterogeneity in the rate or process of quantitative trait evolution across the branches or clades of a phylogenetic tree. Here, I present a new method for modeling variability in the rate of evolution of a continuously-valued character trait on a reconstructed phylogeny. The underlying model of evolution is stochastic diffusion (Brownian motion), but in which the instantaneous diffusion rate (σ 2 ) also evolves by Brownian motion on a logarithmic scale. Unfortunately, it’s not possible to simultaneously estimate the rates of evolution along each edge of the tree and the rate of evolution of σ 2 itself using Maximum Likelihood. As such, I propose a penalized-likelihood method in which the penalty term is equal to the log-transformed probability density of the rates under a Brownian model, multiplied by a ‘smoothing’ coefficient, λ, selected by the user. λ determines the magnitude of penalty that’s applied to rate variation between edges. Lower values of λ penalize rate variation relatively little; whereas larger λ values result in minimal rate variation among edges of the tree in the fitted model, eventually converging on a single value of σ 2 for all of the branches of the tree. In addition to presenting this model here, I have also implemented it as part of my phytools R package in the function multirateBM . Using different values of the penalty coefficient, λ, I fit the model to simulated data with: Brownian rate variation among edges (the model assumption); uncorrelated rate variation; rate changes that occur in discrete places on the tree; and no rate variation at all among the branches of the phylogeny. I then compare the estimated values of σ 2 to their known true values. In addition, I use the method to analyze a simple empirical dataset of body mass evolution in mammals. Finally, I discuss the relationship between the method of this article and other models from the phylogenetic comparative methods and finance literature, as well as some applications and limitations of the approach.  more » « less
Award ID(s):
1759940
NSF-PAR ID:
10391204
Author(s) / Creator(s):
Date Published:
Journal Name:
PeerJ
Volume:
9
ISSN:
2167-8359
Page Range / eLocation ID:
e11997
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Understanding phenotypic disparity across the tree of life requires identifying where and when evolutionary rates change on phylogeny. A primary methodological challenge in macroevolution is therefore to develop methods for accurate inference of among-lineage variation in rates of phenotypic evolution. Here, we describe a method for inferring among-lineage evolutionary rate heterogeneity in both continuous and discrete traits. The method assumes that the present-day distribution of a trait is shaped by a variable-rate process arising from a mixture of constant-rate processes and uses a single-pass tree traversal algorithm to estimate branch-specific evolutionary rates. By employing dynamic programming optimization techniques and approximate maximum likelihood estimators where appropriate, our method permits rapid exploration of the tempo and mode of phenotypic evolution. Simulations indicate that the method reconstructs rates of trait evolution with high accuracy. Application of the method to data sets on squamate reptile reproduction and turtle body size recovers patterns of rate heterogeneity identified by previous studies but with computational costs reduced by many orders of magnitude. Our results expand the set of tools available for detecting macroevolutionary rate heterogeneity and point to the utility of fast, approximate methods for studying large-scale biodiversity dynamics. [Brownian motion; continuous characters; discrete characters; macroevolution; Markov process; rate heterogeneity.]

     
    more » « less
  2. Buerkle, Alex (Ed.)
    It is now understood that introgression can serve as powerful evolutionary force, providing genetic variation that can shape the course of trait evolution. Introgression also induces a shared evolutionary history that is not captured by the species phylogeny, potentially complicating evolutionary analyses that use a species tree. Such analyses are often carried out on gene expression data across species, where the measurement of thousands of trait values allows for powerful inferences while controlling for shared phylogeny. Here, we present a Brownian motion model for quantitative trait evolution under the multispecies network coalescent framework, demonstrating that introgression can generate apparently convergent patterns of evolution when averaged across thousands of quantitative traits. We test our theoretical predictions using whole-transcriptome expression data from ovules in the wild tomato genus Solanum . Examining two sub-clades that both have evidence for post-speciation introgression, but that differ substantially in its magnitude, we find patterns of evolution that are consistent with histories of introgression in both the sign and magnitude of ovule gene expression. Additionally, in the sub-clade with a higher rate of introgression, we observe a correlation between local gene tree topology and expression similarity, implicating a role for introgressed cis -regulatory variation in generating these broad-scale patterns. Our results reveal a general role for introgression in shaping patterns of variation across many thousands of quantitative traits, and provide a framework for testing for these effects using simple model-informed predictions. 
    more » « less
  3. Abstract

    Traits that have arisen multiple times yet still remain rare present a curious paradox. A number of these rare traits show a distinct tippy pattern, where they appear widely dispersed across a phylogeny, are associated with short branches and differ between recently diverged sister species. This phylogenetic pattern has classically been attributed to the trait being an evolutionary dead end, where the trait arises due to some short‐term evolutionary advantage, but it ultimately leads species to extinction. While the higher extinction rate associated with a dead end trait could produce such a tippy pattern, a similar pattern could appear if lineages with the trait speciated slower than other lineages, or if the trait was lost more often that it was gained. In this study, we quantify the degree of tippiness of red flowers in the tomato family, Solanaceae, and investigate the macroevolutionary processes that could explain the sparse phylogenetic distribution of this trait. Using a suite of metrics, we confirm that red‐flowered lineages are significantly overdispersed across the tree and form smaller clades than expected under a null model. Next, we fit 22 alternative models using HiSSE(Hidden State Speciation and Extinction), which accommodates asymmetries in speciation, extinction and transition rates that depend on observed and unobserved (hidden) character states. Results of the model fitting indicated significant variation in diversification rates across the family, which is best explained by the inclusion of hidden states. Our best fitting model differs between the maximum clade credibility tree and when incorporating phylogenetic uncertainty, suggesting that the extreme tippiness and rarity of red Solanaceae flowers makes it difficult to distinguish among different underlying processes. However, both of the best models strongly support a bias towards the loss of red flowers. The best fitting HiSSEmodel when incorporating phylogenetic uncertainty lends some support to the hypothesis that lineages with red flowers exhibit reduced diversification rates due to elevated extinction rates. Future studies employing simulations or targeting population‐level processes may allow us to determine whether red flowers in Solanaceae or other angiosperms clades are rare and tippy due to a combination of processes, or asymmetrical transitions alone.

     
    more » « less
  4. Abstract

    How the microbiome interacts with hosts across evolutionary time is poorly understood. Data sets including many host species are required to conduct comparative analyses. Here, we analyzed 142 intestinal microbiome samples from 92 birds belonging to 74 species from Equatorial Guinea, using the 16S rRNA gene. Using four definitions for microbial taxonomic units (97%OTU, 99%OTU, 99%OTU with singletons removed, ASV), we conducted alpha and beta diversity analyses. We found that raw abundances and diversity varied between the data sets but relative patterns were largely consistent across data sets. Host taxonomy, diet and locality were significantly associated with microbiomes, at generally similar levels using three distance metrics. Phylogenetic comparative methods assessed the evolutionary relationship between the microbiome as a trait of a host species and the underlying bird phylogeny. Using multiple ways of defining “microbiome traits”, we found that a neutral Brownian motion model did not explain variation in microbiomes. Instead, we found a White Noise model (indicating little phylogenetic signal), was most likely. There was some support for the Ornstein‐Uhlenbeck model (that invokes selection), but the level of support was similar to that of a White Noise simulation, further supporting the White Noise model as the best explanation for the evolution of the microbiome as a trait of avian hosts. Our study demonstrated that both environment and evolution play a role in the gut microbiome and the relationship does not follow a neutral model; these biological results are qualitatively robust to analytical choices.

     
    more » « less
  5. Abstract

    Phylogenetic studies of geographic range evolution are increasingly using statistical model selection methods to choose among variants of the dispersal‐extinction‐cladogenesis (DEC) model, especially betweenDECandDEC+J, a variant that emphasizes “jump dispersal,” or founder‐event speciation, as a type of cladogenetic range inheritance scenario. Unfortunately,DEC+J is a poor model of founder‐event speciation, and statistical comparisons of its likelihood withDECare inappropriate.DECandDEC+J share a conceptual flaw: cladogenetic events of range inheritance at ancestral nodes, unlike anagenetic events of dispersal and local extinction along branches, are not modelled as being probabilistic with respect to time. Ignoring this probability factor artificially inflates the contribution of cladogenetic events to the likelihood, and leads to underestimates of anagenetic, time‐dependent range evolution. The flaw is exacerbated inDEC+J because not only is jump dispersal allowed, expanding the set of cladogenetic events, its probability relative to non‐jump events is assigned a free parameter,j, that when maximized precludes the possibility of non‐jump events at ancestral nodes.DEC+J thus parameterizes themodeof speciation, but likeDEC, it does not parameterize therateof speciation. This inconsistency has undesirable consequences, such as a greater tendency towards degenerate inferences in which the data are explained entirely by cladogenetic events (at which point branch lengths become irrelevant, with estimated anagenetic rates of 0). Inferences withDEC+J can in some cases depart dramatically from intuition, e.g. when highly unparsimonious numbers of jump dispersal events are required solely becausejis maximized. Statistical comparison withDECis inappropriate because a higherDEC+J likelihood does not reflect a more close approximation of the “true” model of range evolution, which surely must include time‐dependent processes; instead, it is simply due to more weight being allocated (viaj) to jump dispersal events whose time‐dependent probabilities are ignored. In testing hypotheses about the geographic mode of speciation, jump dispersal can and should instead be modelled using existing frameworks for state‐dependent lineage diversification in continuous time, taking appropriate cautions against Type I errors associated with such methods. For simple inference of ancestral ranges on a fixed phylogeny, aDEC‐based model may be defensible if statistical model selection is not used to justify the choice, and it is understood that inferences about cladogenetic range inheritance lack any relation to time, normally a fundamental axis of evolutionary models.

     
    more » « less