skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reducing the Biases in False Correlations Between Discrete Characters
Abstract The correlation between two characters is often interpreted as evidence that there exists a significant and biologically important relationship between them. However, Maddison and FitzJohn (in The unsolved challenge to phylogenetic correlation tests for categorical characters. Syst. Biol. 2015;64:127–136) recently pointed out that evidence of correlated evolution between two categorical characters is often spurious, particularly, when the dependent relationship stems from a single replicate deep in time. Here we will show that there may, in fact, be a statistical solution to the problem posed by Maddison and FitzJohn naturally embedded within the expanded model space afforded by the hidden Markov model (HMM) framework. We demonstrate that the problem of single unreplicated evolutionary events manifests itself as rate heterogeneity within our models and that this is the source of the false correlation. Therefore, we argue that this problem is better understood as model misspecification rather than a failure of comparative methods to account for phylogenetic pseudoreplication. We utilize HMMs to develop a multirate independent model which, when implemented, drastically reduces support for correlation. The problem itself extends beyond categorical character evolution, but we believe that the practical solution presented here may lend itself to future extensions in other areas of comparative biology. [Macroevolution; model adequacy; phylogenetic comparative methods; rate heterogeneity].  more » « less
Award ID(s):
1916558
PAR ID:
10465466
Author(s) / Creator(s):
;
Editor(s):
Smith, Stacey
Date Published:
Journal Name:
Systematic Biology
Volume:
72
Issue:
2
ISSN:
1063-5157
Page Range / eLocation ID:
476 to 488
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Within-species trait variation may be the result of genetic variation, environmental variation, or measurement error, for example. In phylogenetic comparative studies, failing to account for within-species variation has many adverse effects, such as increased error in testing hypotheses about evolutionary correlations, biased estimates of evolutionary rates, and inaccurate inference of the mode of evolution. These adverse effects were demonstrated in studies that considered a tree-like underlying phylogeny. Comparative methods on phylogenetic networks are still in their infancy. The impact of within-species variation on network-based methods has not been studied. Here, we introduce a phylogenetic linear model in which the phylogeny can be a network to account for within-species variation in the continuous response trait assuming equal within-species variances across species. We show how inference based on the individual values can be reduced to a problem using species-level summaries, even when the within-species variance is estimated. Our method performs well under various simulation settings and is robust when within-species variances are unequal across species. When phenotypic (within-species) correlations differ from evolutionary (between-species) correlations, estimates of evolutionary coefficients are pulled towards the phenotypic coefficients for all methods we tested. Also, evolutionary rates are either underestimated or overestimated, depending on the mismatch between phenotypic and evolutionary relationships. We applied our method to morphological and geographical data from Polemonium. We find a strong negative correlation of leaflet size with elevation, despite a positive correlation within species. Our method can explore the role of gene flow in trait evolution by comparing the fit of a network to that of a tree. We find marginal evidence for leaflet size being affected by gene flow and support for previous observations on the challenges of using individual continuous traits to infer inheritance weights at reticulations. Our method is freely available in the Julia package PhyloNetworks. 
    more » « less
  2. In recent years it has become increasingly popular to use phylogenetic comparative methods to investigate heterogeneity in the rate or process of quantitative trait evolution across the branches or clades of a phylogenetic tree. Here, I present a new method for modeling variability in the rate of evolution of a continuously-valued character trait on a reconstructed phylogeny. The underlying model of evolution is stochastic diffusion (Brownian motion), but in which the instantaneous diffusion rate (σ 2 ) also evolves by Brownian motion on a logarithmic scale. Unfortunately, it’s not possible to simultaneously estimate the rates of evolution along each edge of the tree and the rate of evolution of σ 2 itself using Maximum Likelihood. As such, I propose a penalized-likelihood method in which the penalty term is equal to the log-transformed probability density of the rates under a Brownian model, multiplied by a ‘smoothing’ coefficient, λ, selected by the user. λ determines the magnitude of penalty that’s applied to rate variation between edges. Lower values of λ penalize rate variation relatively little; whereas larger λ values result in minimal rate variation among edges of the tree in the fitted model, eventually converging on a single value of σ 2 for all of the branches of the tree. In addition to presenting this model here, I have also implemented it as part of my phytools R package in the function multirateBM . Using different values of the penalty coefficient, λ, I fit the model to simulated data with: Brownian rate variation among edges (the model assumption); uncorrelated rate variation; rate changes that occur in discrete places on the tree; and no rate variation at all among the branches of the phylogeny. I then compare the estimated values of σ 2 to their known true values. In addition, I use the method to analyze a simple empirical dataset of body mass evolution in mammals. Finally, I discuss the relationship between the method of this article and other models from the phylogenetic comparative methods and finance literature, as well as some applications and limitations of the approach. 
    more » « less
  3. Abstract Many hypotheses in the field of phylogenetic comparative biology involve specific changes in the rate or process of trait evolution. This is particularly true of approaches designed to connect macroevolutionary pattern to microevolutionary process. We present a method designed to test whether the rate of evolution of a discrete character has changed in one or more clades, lineages, or time periods. This method differs from other related approaches (such as the ‘covarion’ model) in that the ‘regimes’ in which the rate or process is postulated to have changed are specified a priori by the user, rather than inferred from the data. Similarly, it differs from methods designed to model a correlation between two binary traits in that the regimes mapped onto the tree are fixed. We apply our method to investigate the rate of dewlap color and/or caudal vertebra number evolution in Caribbean and mainland clades of the diverse lizard genus Anolis. We find little evidence to support any difference in the evolutionary process between mainland and island evolution for either character. We also examine the statistical properties of the method more generally and show that it has acceptable type I error, parameter estimation, and power. Finally, we discuss some general issues of frequentist hypothesis testing and model adequacy, as well as the relationship of our method to existing models of heterogeneity in the rate of discrete character evolution on phylogenies. 
    more » « less
  4. Numerous questions in phylogenetic comparative biology revolve around the correlated evolution of two or more phenotypic traits on a phylogeny. In many cases, it may be sufficient to assume a constant value for the evolutionary correlation between characters across all the clades and branches of the tree. Under other circumstances, however, it is desirable or necessary to account for the possibility that the evolutionary correlation differs through time or in different sections of the phylogeny. Here, we present a method designed to fit a hierarchical series of models for heterogeneity in the evolutionary rates and correlation of two quantitative traits on a phylogenetic tree. We apply the method to two datasets: one for different attributes of the buccal morphology in sunfishes (Centrarchidae); and a second for overall body length and relative body depth in rock- and non-rock-dwelling South American iguanian lizards. We also examine the performance of the method for parameter estimation and model selection using a small set of numerical simulations. 
    more » « less
  5. Zhou, Xuming (Ed.)
    Abstract Comparative genomics approaches seek to associate molecular evolution with the evolution of phenotypes across a phylogeny. Many of these methods lack the ability to analyze non-ordinal categorical traits with more than two categories. To address this limitation, we introduce an expansion to RERconverge that associates shifts in evolutionary rates with the convergent evolution of categorical traits. The categorical RERconverge expansion includes methods for performing categorical ancestral state reconstruction, statistical tests for associating relative evolutionary rates with categorical variables, and a new method for performing phylogeny-aware permutations, “permulations”, on categorical traits. We demonstrate our new method on a three-category diet phenotype, and we compare its performance to binary RERconverge analyses and two existing methods for comparative genomic analyses of categorical traits: phylogenetic simulations and a phylogenetic signal based method. We present an analysis of how the categorical permulations scale with the number of species and the number of categories included in the analysis. Our results show that our new categorical method outperforms phylogenetic simulations at identifying genes and enriched pathways significantly associated with the diet phenotypes and that the categorical ancestral state reconstruction drives an improvement in our ability to capture diet-related enriched pathways compared to binary RERconverge when implemented without user input on phenotype evolution. The categorical expansion to RERconverge will provide a strong foundation for applying the comparative method to categorical traits on larger data sets with more species and more complex trait evolution than have previously been analyzed. 
    more » « less