It has long been appreciated that analyses of genomic data (e.g., whole genome sequencing or sequence capture) have the potential to reveal the tree of life, but it remains challenging to move from sequence data to a clear understanding of evolutionary history, in part due to the computational challenges of phylogenetic estimation using genome-scale data. Supertree methods solve that challenge because they facilitate a divide-and-conquer approach for large-scale phylogeny inference by integrating smaller subtrees in a computationally efficient manner. Here, we combined information from sequence capture and whole-genome phylogenies using supertree methods. However, the available phylogenomic trees had limited overlap so we used taxon-rich (but not phylogenomic) megaphylogenies to weave them together. This allowed us to construct a phylogenomic supertree, with support values, that included 707 bird species (~7% of avian species diversity). We estimated branch lengths using mitochondrial sequence data and we used these branch lengths to estimate divergence times. Our time-calibrated supertree supports radiation of all three major avian clades (Palaeognathae, Galloanseres, and Neoaves) near the Cretaceous-Paleogene (K-Pg) boundary. The approach we used will permit the continued addition of taxa to this supertree as new phylogenomic data are published, and it could be applied to other taxa as well.
more »
« less
An Evaluation of Phylogenetic Workflows in Viral Molecular Epidemiology
The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime.
more »
« less
- Award ID(s):
- 2028040
- PAR ID:
- 10323858
- Date Published:
- Journal Name:
- Viruses
- Volume:
- 14
- Issue:
- 4
- ISSN:
- 1999-4915
- Page Range / eLocation ID:
- 774
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Xia, Xuhua (Ed.)Abstract Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a non-convex optimization problem where the variance of log-transformed rate multipliers are minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.more » « less
-
Abstract Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.more » « less
-
Abstract Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introducetopiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum‐likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML‐NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non‐expert readers to the steps required for ASR, describe the specific design choices made intopiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download:https://github.com/harmslab/topiary.more » « less
-
Abstract Recently discovered amber-preserved fossil Cicadellidae exhibit combinations of morphological traits not observed in the modern fauna and have the potential to shed new light on the evolution of this highly diverse family. To place the fossils explicitly within a phylogenetic context, representatives of five extinct genera from Cretaceous Myanmar amber, and one from Eocene Baltic amber were incorporated into a matrix comprising 229 discrete morphological characters and representatives of all modern subfamilies. Phylogenetic analyses yielded well resolved and largely congruent estimates that support the monophyly of most previously recognized cicadellid subfamilies and indicate that the treehoppers are derived from a lineage of Cicadellidae. Instability in the morphology-based phylogenies is mainly confined to deep internal splits that received low branch support in one or more analyses and also were not consistently resolved by recent phylogenomic analyses. Placement of fossil taxa is mostly stable across analyses. Three new Cretaceous leafhopper genera, Burmotettix gen. nov., Kachinella gen nov., and Viraktamathus gen. nov., consistently form a monophyletic group distinct from extant leafhopper subfamilies and are placed in Burmotettiginae subfam. nov. Extinct Cretaceous fossils previously placed in Ledrinae and Signoretiinae are recovered as sister to modern representatives of these groups. Eomegophthalmus Dietrich and Gonçalves from Baltic amber consistently groups with a lineage comprising treehoppers, Megophthalminae, Ulopinae, and Eurymelinae but its position is unstable. Overall, the morphology-based phylogenetic estimates agree with recent phylogenies based on molecular data alone suggesting that morphological traits recently used to diagnose subfamilies are generally informative of phylogenetic relationships within this group.more » « less
An official website of the United States government

