Efficient Inference of Macrophylogenies: Insights from the Avian Tree of Life

Zhao, Min (ORCID:0000000224168778); Thom, Gregory (ORCID:0000000162000565); Faircloth, Brant C; Andersen, Michael J; Barker, F Keith; Benz, Brett W; Braun, Michael J; Bravo, Gustavo A; Brumfield, Robb T (ORCID:0000000323070688); Chesser, R Terry; Derryberry, Elizabeth P; Glenn, Travis C; Harvey, Michael G (ORCID:0000000180506068); Hosner, Peter A; Imfeld, Tyler S; Joseph, Leo (ORCID:0000000175641978); Manthey, Joseph D; McCormack, John E; McCullough, Jenna M; Moyle, Robert G; Oliveros, Carl H; White Carreiro, Noor D (ORCID:0000000295103744); Winker, Kevin (ORCID:0000000289858104); Field, Daniel J (ORCID:0000000217860352); Ksepka, Daniel T (ORCID:0000000330206803); Braun, Edward L; Kimball, Rebecca T (ORCID:0000000154495481); Smith, Brian Tilston

doi:10.1093/sysbio/syaf080

Abstract The exponential growth of molecular sequence data over the past decade has enabled the construction of numerous clade-specific phylogenies encompassing hundreds or thousands of taxa. These independent studies often include overlapping data, presenting a unique opportunity to build macrophylogenies (phylogenies sampling >1000 taxa) for entire classes across the Tree of Life. However, the inference of large trees remains constrained by logistical, computational, and methodological challenges. The Avian Tree of Life provides an ideal model for evaluating strategies to robustly infer macrophylogenies from intersecting data sets derived from smaller studies. In this study, we leveraged a comprehensive resource of sequence capture data sets to evaluate the phylogenetic accuracy and computational costs of four methodological approaches: (1) supermatrix approaches using concatenation, including the “fast” maximum likelihood (ML) methods, (2) filtering data sets to reduce heterogeneity, (3) supertree estimation based on published phylogenomic trees, and (4) a “divide-and-conquer” strategy, wherein smaller ML trees were estimated and subsequently combined using a supertree approach. Additionally, we examined the impact of these methods on divergence time estimation using a data set that includes newly vetted fossil calibrations for the Avian Tree of Life. Our findings highlight the advantages of recently developed fast tree search approaches initiated with parsimony starting trees, which offer a reasonable compromise between computational efficiency and phylogenetic accuracy, facilitating inference of macrophylogenies.

More Like this