skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages
Abstract Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.  more » « less
Award ID(s):
1924570
PAR ID:
10356835
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Phadke, Sujal
Date Published:
Journal Name:
Genome Biology and Evolution
Volume:
14
Issue:
8
ISSN:
1759-6653
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Archibald, John (Ed.)
    Abstract Epigenetic processes in eukaryotes play important roles through regulation of gene expression, chromatin structure, and genome rearrangements. The roles of chromatin modification (e.g., DNA methylation and histone modification) and non-protein-coding RNAs have been well studied in animals and plants. With the exception of a few model organisms (e.g., Saccharomyces and Plasmodium), much less is known about epigenetic toolkits across the remainder of the eukaryotic tree of life. Even with limited data, previous work suggested the existence of an ancient epigenetic toolkit in the last eukaryotic common ancestor. We use PhyloToL, our taxon-rich phylogenomic pipeline, to detect homologs of epigenetic genes and evaluate their macroevolutionary patterns among eukaryotes. In addition to data from GenBank, we increase taxon sampling from understudied clades of SAR (Stramenopila, Alveolata, and Rhizaria) and Amoebozoa by adding new single-cell transcriptomes from ciliates, foraminifera, and testate amoebae. We focus on 118 gene families, 94 involved in chromatin modification and 24 involved in non-protein-coding RNA processes based on the epigenetics literature. Our results indicate 1) the presence of a large number of epigenetic gene families in the last eukaryotic common ancestor; 2) differential conservation among major eukaryotic clades, with a notable paucity of genes within Excavata; and 3) punctate distribution of epigenetic gene families between species consistent with rapid evolution leading to gene loss. Together these data demonstrate the power of taxon-rich phylogenomic studies for illuminating evolutionary patterns at scales of >1 billion years of evolution and suggest that macroevolutionary phenomena, such as genome conflict, have shaped the evolution of the eukaryotic epigenetic toolkit. 
    more » « less
  2. Premise of the Study

    This investigation establishes the firstDNA‐sequence‐based phylogenetic hypothesis of species relationships in the coca family (Erythroxylaceae) and presents its implications for the intrageneric taxonomy and neotropical biogeography ofErythroxylum. We also identify the closest wild relatives and evolutionary relationships of the cultivated coca taxa.

    Methods

    We focused our phylogenomic inference on the largest taxonomic section in the genusErythroxylum(ArcherythroxylumO.E.Schulz) using concatenation and gene tree reconciliation methods from hybridization‐based target capture of 427 genes.

    Key Results

    We show that neotropicalErythroxylumare monophyletic within the paleotropical lineages, yetArcherythroxylumand all of the other taxonomic sections from which we sampled multiple species lack monophyly. We mapped phytogeographic states onto the tree and found some concordance between these regions and clades. The wild speciesE. gracilipesandE. cataractarumare most closely related to the cultivatedE. cocaandE. novogranatense, but relationships within this “coca” clade remain equivocal.

    Conclusions

    Our results point to the difficulty of morphology‐based intrageneric classification in this clade and highlight the importance of integrative taxonomy in future systematic revisions. We can confidently identifyE. gracilipesandE. cataractarumas the closest wild relatives of the coca taxa, but understanding the domestication history of this crop will require more thorough phylogeographic analysis.

     
    more » « less
  3. Abstract

    Gene‐tree‐inference error can cause species‐tree‐inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene‐tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene‐tree branches with 0% approximate‐likelihood‐ratio‐test (SH‐like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson–Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene‐tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent‐tree clades that contradicted concatenation‐tree clades were generally less robust to gene‐tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene‐tree clades (0% SH‐like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony‐based analyses) for improving quantification of gene‐tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene‐tree‐estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.

     
    more » « less
  4. Animals use geomagnetic fields for navigational cues, yet the sensory mechanism underlying magnetic perception remains poorly understood. One idea is that geomagnetic fields are physically transduced by magnetite crystals contained inside specialized receptor cells, but evidence for intracellular, biogenic magnetite in eukaryotes is scant. Certain bacteria produce magnetite crystals inside intracellular compartments, representing the most ancient form of biomineralization known and having evolved prior to emergence of the crown group of eukaryotes, raising the question of whether magnetite biomineralization in eukaryotes and prokaryotes might share a common evolutionary history. Here, we discover that salmonid olfactory epithelium contains magnetite crystals arranged in compact clusters and determine that genes differentially expressed in magnetic olfactory cells, contrasted to nonmagnetic olfactory cells, share ancestry with an ancient prokaryote magnetite biomineralization system, consistent with exaptation for use in eukaryotic magnetoreception. We also show that 11 prokaryote biomineralization genes are universally present among a diverse set of eukaryote taxa and that nine of those genes are present within the Asgard clade of archaea Lokiarchaeota that affiliates with eukaryotes in phylogenomic analysis. Consistent with deep homology, we present an evolutionary genetics hypothesis for magnetite formation among eukaryotes to motivate convergent approaches for examining magnetite-based magnetoreception, molecular origins of matrix-associated biomineralization processes, and eukaryogenesis. 
    more » « less
  5. Abstract

    The Calyptratae, one of the most species‐rich fly clades, only originated and diversified after the Cretaceous–Palaeogene extinction event and yet exhibit high species diversity and a diverse array of life history strategies including predation, phytophagy, saprophagy, haematophagy and parasitism. We present the first phylogenomic analysis of calyptrate relationships. The analysis is based on 40 species representing all calyptrate families and on nucleotide and amino acid data for 1456 single‐copy protein‐coding genes obtained from shotgun sequencing of transcriptomes. Topologies are overall well resolved, robust and largely congruent across trees obtained with different approaches (maximum parsimony, maximum likelihood, coalescent‐based species tree, four‐cluster likelihood mapping). Many nodes have 100% bootstrap and jackknife support, but the true support varies by more than one order of magnitude [Bremer support from 3 to 3427; random addition concatenation analysis (RADICAL) gene concatenation size from 10 to 1456]. Analyses of a Dayhoff‐6 recoded amino acid dataset also support the robustness of many clades. The backbone topology Hippoboscoidea+(Fanniidae+(Muscidae+((Anthomyiidae–Scathophagidae)+Oestroidea))) is strongly supported and most families are monophyletic (exceptions: Anthomyiidae and Calliphoridae). The monotypic Ulurumyiidae is either alone or together with Mesembrinellidae as the sister group to the rest of Oestroidea. The Sarcophagidae are sister to Mystacinobiidae+Oestridae. Polleniinae emerge as sister group to Tachinidae and the monophyly of the clade Calliphorinae+Luciliinae is well supported, but the phylogenomic data cannot confidently place the remaining blowfly subfamilies (Helicoboscinae, Ameniinae, Chrysomyinae). Compared to hypotheses from the Sanger sequencing era, many clades within the muscoid grade are congruent but now have much higher support. Within much of Oestroidea, Sanger era and phylogenomic data struggle equally with regard to finding well‐supported hypotheses.

     
    more » « less