skip to main content

Title: Robust Phylogenetic Regression

Modern comparative biology owes much to phylogenetic regression. At its conception, this technique sparked a revolution that armed biologists with phylogenetic comparative methods (PCMs) for disentangling evolutionary correlations from those arising from hierarchical phylogenetic relationships. Over the past few decades, the phylogenetic regression framework has become a paradigm of modern comparative biology that has been widely embraced as a remedy for shared ancestry. However, recent evidence has shown doubt over the efficacy of phylogenetic regression, and PCMs more generally, with the suggestion that many of these methods fail to provide an adequate defense against unreplicated evolution—the primary justification for using them in the first place. Importantly, some of the most compelling examples of biological innovation in nature result from abrupt lineage-specific evolutionary shifts, which current regression models are largely ill equipped to deal with. Here we explore a solution to this problem by applying robust linear regression to comparative trait data. We formally introduce robust phylogenetic regression to the PCM toolkit with linear estimators that are less sensitive to model violations than the standard least-squares estimator, while still retaining high power to detect true trait associations. Our analyses also highlight an ingenuity of the original algorithm for phylogenetic regression based on independent contrasts, whereby robust estimators are particularly effective. Collectively, we find that robust estimators hold promise for improving tests of trait associations and offer a path forward in scenarios where classical approaches may fail. Our study joins recent arguments for increased vigilance against unreplicated evolution and a better understanding of evolutionary model performance in challenging—yet biologically important—settings.

more » « less
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Systematic Biology
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Traits underlie organismal responses to their environment and are essential to predict community responses to environmental conditions under global change. Species differ in life‐history traits, morphometrics, diet type, reproductive characteristics and habitat utilization.

    Trait associations are widely analysed using phylogenetic comparative methods (PCM) to account for correlations among related species. Similarly, traits are measured for some but not all species, and missing continuous traits (e.g. growth rate) can be imputed using ‘phylogenetic trait imputation’ (PTI), based on evolutionary relatedness and trait covariance. However, PTI has not been available for categorical traits, and estimating covariance among traits without ecological constraints risks inferring implausible evolutionary mechanisms.

    Here, we extend previous PCM and PTI methods by (1) specifying covariance among traits as a structural equation model (SEM), and (2) incorporating associations among both continuous and categorical traits. Fitting a SEM replaces the covariance among traits with a set of linear path coefficients specifying potential evolutionary mechanisms. Estimated parameters then represent regression slopes (i.e. the average change in trait Y given an exogenous change in trait X) that can be used to calculate both direct effects (X impacts Y) and indirect effects (X impacts Z and Z impacts Y).

    We demonstrate phylogenetic structural‐equation mixed‐trait imputation using 33 variables representing life history, reproductive, morphological, and behavioural traits for all >32,000 described fishes worldwide. SEM coefficients suggest that one degree Celsius increase in habitat is associated with an average 3.5% increase in natural mortality (including a 1.4% indirect impact that acts via temperature effects on the growth coefficient), and an average 3.0% decrease in fecundity (via indirect impacts on maximum age and length). Cross‐validation indicates that the model explains 54%–89% of variance for withheld measurements of continuous traits and has an area under the receiver‐operator‐characteristics curve of 0.86–0.99 for categorical traits.

    We use imputed traits to classify all fishes into life‐history types, and confirm a phylogenetic signal in three dominant life‐history strategies in fishes. PTI using phylogenetic SEMs ensures that estimated parameters are interpretable as regression slopes, such that the inferred evolutionary relationships can be compared with long‐term evolutionary and rearing experiments.

    more » « less
  2. Smith, Stacey (Ed.)
    Abstract The correlation between two characters is often interpreted as evidence that there exists a significant and biologically important relationship between them. However, Maddison and FitzJohn (in The unsolved challenge to phylogenetic correlation tests for categorical characters. Syst. Biol. 2015;64:127–136) recently pointed out that evidence of correlated evolution between two categorical characters is often spurious, particularly, when the dependent relationship stems from a single replicate deep in time. Here we will show that there may, in fact, be a statistical solution to the problem posed by Maddison and FitzJohn naturally embedded within the expanded model space afforded by the hidden Markov model (HMM) framework. We demonstrate that the problem of single unreplicated evolutionary events manifests itself as rate heterogeneity within our models and that this is the source of the false correlation. Therefore, we argue that this problem is better understood as model misspecification rather than a failure of comparative methods to account for phylogenetic pseudoreplication. We utilize HMMs to develop a multirate independent model which, when implemented, drastically reduces support for correlation. The problem itself extends beyond categorical character evolution, but we believe that the practical solution presented here may lend itself to future extensions in other areas of comparative biology. [Macroevolution; model adequacy; phylogenetic comparative methods; rate heterogeneity]. 
    more » « less

    Hybridization has long been recognized as a fundamental evolutionary process in plants but, until recently, our understanding of its phylogenetic distribution and biological significance across deep evolutionary scales has been largely obscure. Over the past decade, genomic and phylogenomic datasets have revealed, perhaps not surprisingly, that hybridization, often associated with polyploidy, has been common throughout the evolutionary history of plants, particularly in various lineages of flowering plants. However, phylogenomic studies have also highlighted the challenges of disentangling signals of ancient hybridization from other sources of genomic conflict (in particular, incomplete lineage sorting). Here, we provide a critical review of ancient hybridization in vascular plants, outlining well‐documented cases of ancient hybridization across plant phylogeny, as well as the challenges unique to documenting ancient versus recent hybridization. We provide a definition for ancient hybridization, which, to our knowledge, has not been explicitly attempted before. Further documenting the extent of deep reticulation in plants should remain an important research focus, especially because published examples likely represent the tip of the iceberg in terms of the total extent of ancient hybridization. However, future research should increasingly explore the macroevolutionary significance of this process, in terms of its impact on evolutionary trajectories (e.g. how does hybridization influence trait evolution or the generation of biodiversity over long time scales?), as well as how life history and ecological factors shape, or have shaped, the frequency of hybridization across geologic time and plant phylogeny. Finally, we consider the implications of ubiquitous ancient hybridization for how we conceptualize, analyze, and classify plant phylogeny. Networks, as opposed to bifurcating trees, represent more accurate representations of evolutionary history in many cases, although our ability to infer, visualize, and use networks for comparative analyses is highly limited. Developing improved methods for the generation, visualization, and use of networks represents a critical future direction for plant biology. Current classification systems also do not generally allow for the recognition of reticulate lineages, and our classifications themselves are largely based on evidence from the chloroplast genome. Updating plant classification to better reflect nuclear phylogenies, as well as considering whether and how to recognize hybridization in classification systems, will represent an important challenge for the plant systematics community.

    more » « less
  4. Abstract Aim

    Closely related species tend to resemble each other in their morphology and ecology because of shared ancestry. When exploring correlations between species traits, therefore, species cannot be treated as statistically independent. Phylogenetic comparative methods (PCMs) attempt to correct statistically for this shared evolutionary history. Almost all such approaches, however, assume that correlations between traits are constant across the tips of the tree, which we refer to as phylogenetic stationarity. We suggest that this assumption of phylogenetic stationarity might be often violated and that relationships between species traits might evolve alongside clades, for example, owing to the effects of unmeasured traits or other latent variables. Specific examples range from shifts in allometric scaling relationships between clades (e.g., basal metabolic rate and body mass in endotherms, and tree diameter and biomass in trees) to the differing relationship between leaf mass per area and shade tolerance in deciduous versus evergreen trees and shrubs.


    Here, we introduce an exploratory modelling framework, phylogenetically weighted regression, which represents an extension of geographically weighted regression (GWR) used in spatial studies, to allow non‐stationarity in model parameters across a phylogenetic tree. We demonstrate our approach using empirical data on flowering time and seed mass from a well‐studied plant community in southeastern Sweden. Our model reveals strong, diverging trends across the phylogeny, including changes in the sign of the relationship between clades.

    Main conclusions

    By allowing for phylogenetic non‐stationarity, we are able to detect shifting relationships among species traits that would be obscured in traditional PCMs; thus, we suggest that PWR might be an important exploratory tool in the search for key missing variables in comparative analyses.

    more » « less
  5. Abstract

    Comparative biologists have typically used one or more of the following methods to assist in evaluating the proposed functional and performance significance of individual traits: comparative phylogenetic analysis, direct interspecific comparison among species, genetic modification, experimental alteration of morphology (for example by surgically modifying traits), and ecological manipulation where individual organisms are transplanted to a different environment. But comparing organisms as the endpoints of an evolutionary process involves the ceteris paribus assumption: that all traits other than the one(s) of interest are held constant. In a properly controlled experimental study, only the variable of interest changes among the groups being compared. The theme of this paper is that the use of robotic or mechanical models offers an additional tool in comparative biology that helps to minimize the effect of uncontrolled variables by allowing direct manipulation of the trait of interest against a constant background. The structure and movement pattern of mechanical devices can be altered in ways not possible in studies of living animals, facilitating testing hypotheses of the functional and performance significance of individual traits. Robotic models of organismal design are particularly useful in three arenas: (1) controlling variation to allow modification only of the trait of interest, (2) the direct measurement of energetic costs of individual traits, and (3) quantification of the performance landscape. Obtaining data in these three areas is extremely difficult through the study of living organisms alone, and the use of robotic models can reveal unexpected effects. Controlling for all variables except for the length of a swimming flexible object reveals substantial non-linear effects that vary with stiffness. Quantification of the swimming performance surface reveals that there are two peaks with comparable efficiency, greatly complicating the inference of performance from morphology alone. Organisms and their ecological interactions are complex, and dissecting this complexity to understand the effects of individual traits is a grand challenge in ecology and evolutionary biology. Robotics has great promise as a “comparative method,” allowing better-controlled comparative studies to analyze the many interacting elements that make up complex behaviors, ecological interactions, and evolutionary histories.

    more » « less