skip to main content


Title: Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages
Abstract Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.  more » « less
Award ID(s):
1924570
NSF-PAR ID:
10356835
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Phadke, Sujal
Date Published:
Journal Name:
Genome Biology and Evolution
Volume:
14
Issue:
8
ISSN:
1759-6653
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Premise of the Study

    This investigation establishes the firstDNA‐sequence‐based phylogenetic hypothesis of species relationships in the coca family (Erythroxylaceae) and presents its implications for the intrageneric taxonomy and neotropical biogeography ofErythroxylum. We also identify the closest wild relatives and evolutionary relationships of the cultivated coca taxa.

    Methods

    We focused our phylogenomic inference on the largest taxonomic section in the genusErythroxylum(ArcherythroxylumO.E.Schulz) using concatenation and gene tree reconciliation methods from hybridization‐based target capture of 427 genes.

    Key Results

    We show that neotropicalErythroxylumare monophyletic within the paleotropical lineages, yetArcherythroxylumand all of the other taxonomic sections from which we sampled multiple species lack monophyly. We mapped phytogeographic states onto the tree and found some concordance between these regions and clades. The wild speciesE. gracilipesandE. cataractarumare most closely related to the cultivatedE. cocaandE. novogranatense, but relationships within this “coca” clade remain equivocal.

    Conclusions

    Our results point to the difficulty of morphology‐based intrageneric classification in this clade and highlight the importance of integrative taxonomy in future systematic revisions. We can confidently identifyE. gracilipesandE. cataractarumas the closest wild relatives of the coca taxa, but understanding the domestication history of this crop will require more thorough phylogeographic analysis.

     
    more » « less
  2. Archibald, John (Ed.)
    Abstract Epigenetic processes in eukaryotes play important roles through regulation of gene expression, chromatin structure, and genome rearrangements. The roles of chromatin modification (e.g., DNA methylation and histone modification) and non-protein-coding RNAs have been well studied in animals and plants. With the exception of a few model organisms (e.g., Saccharomyces and Plasmodium), much less is known about epigenetic toolkits across the remainder of the eukaryotic tree of life. Even with limited data, previous work suggested the existence of an ancient epigenetic toolkit in the last eukaryotic common ancestor. We use PhyloToL, our taxon-rich phylogenomic pipeline, to detect homologs of epigenetic genes and evaluate their macroevolutionary patterns among eukaryotes. In addition to data from GenBank, we increase taxon sampling from understudied clades of SAR (Stramenopila, Alveolata, and Rhizaria) and Amoebozoa by adding new single-cell transcriptomes from ciliates, foraminifera, and testate amoebae. We focus on 118 gene families, 94 involved in chromatin modification and 24 involved in non-protein-coding RNA processes based on the epigenetics literature. Our results indicate 1) the presence of a large number of epigenetic gene families in the last eukaryotic common ancestor; 2) differential conservation among major eukaryotic clades, with a notable paucity of genes within Excavata; and 3) punctate distribution of epigenetic gene families between species consistent with rapid evolution leading to gene loss. Together these data demonstrate the power of taxon-rich phylogenomic studies for illuminating evolutionary patterns at scales of >1 billion years of evolution and suggest that macroevolutionary phenomena, such as genome conflict, have shaped the evolution of the eukaryotic epigenetic toolkit. 
    more » « less
  3. Komeili, Arash (Ed.)
    ABSTRACT Histone proteins are found across diverse lineages of Archaea , many of which package DNA and form chromatin. However, previous research has led to the hypothesis that the histone-like proteins of high-salt-adapted archaea, or halophiles, function differently. The sole histone protein encoded by the model halophilic species Halobacterium salinarum , HpyA, is nonessential and expressed at levels too low to enable genome-wide DNA packaging. Instead, HpyA mediates the transcriptional response to salt stress. Here we compare the features of genome-wide binding of HpyA to those of HstA, the sole histone of another model halophile, Haloferax volcanii . hstA , like hpyA , is a nonessential gene. To better understand HpyA and HstA functions, protein-DNA binding data (chromatin immunoprecipitation sequencing [ChIP-seq]) of these halophilic histones are compared to publicly available ChIP-seq data from DNA binding proteins across all domains of life, including transcription factors (TFs), nucleoid-associated proteins (NAPs), and histones. These analyses demonstrate that HpyA and HstA bind the genome infrequently in discrete regions, which is similar to TFs but unlike NAPs, which bind a much larger genomic fraction. However, unlike TFs that typically bind in intergenic regions, HpyA and HstA binding sites are located in both coding and intergenic regions. The genome-wide dinucleotide periodicity known to facilitate histone binding was undetectable in the genomes of both species. Instead, TF-like and histone-like binding sequence preferences were detected for HstA and HpyA, respectively. Taken together, these data suggest that halophilic archaeal histones are unlikely to facilitate genome-wide chromatin formation and that their function defies categorization as a TF, NAP, or histone. IMPORTANCE Most cells in eukaryotic species—from yeast to humans—possess histone proteins that pack and unpack DNA in response to environmental cues. These essential proteins regulate genes necessary for important cellular processes, including development and stress protection. Although the histone fold domain originated in the domain of life Archaea , the function of archaeal histone-like proteins is not well understood relative to those of eukaryotes. We recently discovered that, unlike histones of eukaryotes, histones in hypersaline-adapted archaeal species do not package DNA and can act as transcription factors (TFs) to regulate stress response gene expression. However, the function of histones across species of hypersaline-adapted archaea still remains unclear. Here, we compare hypersaline histone function to a variety of DNA binding proteins across the tree of life, revealing histone-like behavior in some respects and specific transcriptional regulatory function in others. 
    more » « less
  4. Abstract

    Prokaryotic genomes are often considered to be mosaics of genes that do not necessarily share the same evolutionary history due to widespread horizontal gene transfers (HGTs). Consequently, representing evolutionary relationships of prokaryotes as bifurcating trees has long been controversial. However, studies reporting conflicts among gene trees derived from phylogenomic data sets have shown that these conflicts can be the result of artifacts or evolutionary processes other than HGT, such as incomplete lineage sorting, low phylogenetic signal, and systematic errors due to substitution model misspecification. Here, we present the results of an extensive exploration of phylogenetic conflicts in the cyanobacterial order Nostocales, for which previous studies have inferred strongly supported conflicting relationships when using different concatenated phylogenomic data sets. We found that most of these conflicts are concentrated in deep clusters of short internodes of the Nostocales phylogeny, where the great majority of individual genes have low resolving power. We then inferred phylogenetic networks to detect HGT events while also accounting for incomplete lineage sorting. Our results indicate that most conflicts among gene trees are likely due to incomplete lineage sorting linked to an ancient rapid radiation, rather than to HGTs. Moreover, the short internodes of this radiation fit the expectations of the anomaly zone, i.e., a region of the tree parameter space where a species tree is discordant with its most likely gene tree. We demonstrated that concatenation of different sets of loci can recover up to 17 distinct and well-supported relationships within the putative anomaly zone of Nostocales, corresponding to the observed conflicts among well-supported trees based on concatenated data sets from previous studies. Our findings highlight the important role of rapid radiations as a potential cause of strongly conflicting phylogenetic relationships when using phylogenomic data sets of bacteria. We propose that polytomies may be the most appropriate phylogenetic representation of these rapid radiations that are part of anomaly zones, especially when all possible genomic markers have been considered to infer these phylogenies. [Anomaly zone; bacteria; horizontal gene transfer; incomplete lineage sorting; Nostocales; phylogenomic conflict; rapid radiation; Rhizonema.]

     
    more » « less
  5. Abstract

    The Calyptratae, one of the most species‐rich fly clades, only originated and diversified after the Cretaceous–Palaeogene extinction event and yet exhibit high species diversity and a diverse array of life history strategies including predation, phytophagy, saprophagy, haematophagy and parasitism. We present the first phylogenomic analysis of calyptrate relationships. The analysis is based on 40 species representing all calyptrate families and on nucleotide and amino acid data for 1456 single‐copy protein‐coding genes obtained from shotgun sequencing of transcriptomes. Topologies are overall well resolved, robust and largely congruent across trees obtained with different approaches (maximum parsimony, maximum likelihood, coalescent‐based species tree, four‐cluster likelihood mapping). Many nodes have 100% bootstrap and jackknife support, but the true support varies by more than one order of magnitude [Bremer support from 3 to 3427; random addition concatenation analysis (RADICAL) gene concatenation size from 10 to 1456]. Analyses of a Dayhoff‐6 recoded amino acid dataset also support the robustness of many clades. The backbone topology Hippoboscoidea+(Fanniidae+(Muscidae+((Anthomyiidae–Scathophagidae)+Oestroidea))) is strongly supported and most families are monophyletic (exceptions: Anthomyiidae and Calliphoridae). The monotypic Ulurumyiidae is either alone or together with Mesembrinellidae as the sister group to the rest of Oestroidea. The Sarcophagidae are sister to Mystacinobiidae+Oestridae. Polleniinae emerge as sister group to Tachinidae and the monophyly of the clade Calliphorinae+Luciliinae is well supported, but the phylogenomic data cannot confidently place the remaining blowfly subfamilies (Helicoboscinae, Ameniinae, Chrysomyinae). Compared to hypotheses from the Sanger sequencing era, many clades within the muscoid grade are congruent but now have much higher support. Within much of Oestroidea, Sanger era and phylogenomic data struggle equally with regard to finding well‐supported hypotheses.

     
    more » « less