skip to main content


Title: One thousand plant transcriptomes and the phylogenomics of green plants
Green plants (Viridiplantae) include around 450,000–500,000 species of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life.  more » « less
Award ID(s):
1831428 1737898
NSF-PAR ID:
10126145
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
Nature
Volume:
574
Issue:
7780
ISSN:
0028-0836
Page Range / eLocation ID:
679 - 685
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gaut, Brandon (Ed.)
    Abstract As the closest extant sister group to seed plants, ferns are an important reference point to study the origin and evolution of plant genes and traits. One bottleneck to the use of ferns in phylogenetic and genetic studies is the fact that genome-level sequence information of this group is limited, due to the extreme genome sizes of most ferns. Ceratopteris richardii (hereafter Ceratopteris) has been widely used as a model system for ferns. In this study, we generated a transcriptome of Ceratopteris, through the de novo assembly of the RNA-seq data from 17 sequencing libraries that are derived from two sexual types of gametophytes and five different sporophyte tissues. The Ceratopteris transcriptome, together with 38 genomes and transcriptomes from other species across the Viridiplantae, were used to uncover the evolutionary dynamics of orthogroups (predicted gene families using OrthoFinder) within the euphyllophytes and identify proteins associated with the major shifts in plant morphology and physiology that occurred in the last common ancestors of euphyllophytes, ferns, and seed plants. Furthermore, this resource was used to identify and classify the GRAS domain transcriptional regulators of many developmental processes in plants. Through the phylogenetic analysis within each of the 15 GRAS orthogroups, we uncovered which GRAS family members are conserved or have diversified in ferns and seed plants. Taken together, the transcriptome database and analyses reported here provide an important platform for exploring the evolution of gene families in land plants and for studying gene function in seed-free vascular plants. 
    more » « less
  2. Ferns are the second largest clade of vascular plants with over 10,000 species, yet the generation of genomic resources for the group has lagged behind other major clades of plants. Transcriptomic data have proven to be a powerful tool to assess phylogenetic relationships, using thousands of markers that are largely conserved across the genome, and without the need to sequence entire genomes. We assembled the largest nuclear phylogenetic dataset for ferns to date, including 2884 single-copy nuclear loci from 247 transcriptomes (242 ferns, five outgroups), and investigated phylogenetic relationships across the fern tree, the placement of whole genome duplications (WGDs), and gene retention patterns following WGDs. We generated a well-supported phylogeny of ferns and identified several regions of the fern phylogeny that demonstrate high levels of gene tree–species tree conflict, which largely correspond to areas of the phylogeny that have been difficult to resolve. Using a combination of approaches, we identified 27 WGDs across the phylogeny, including 18 large-scale events (involving more than one sampled taxon) and nine small-scale events (involving only one sampled taxon). Most inferred WGDs occur within single lineages (e.g., orders, families) rather than on the backbone of the phylogeny, although two inferred events are shared by leptosporangiate ferns (excluding Osmundales) and Polypodiales (excluding Lindsaeineae and Saccolomatineae), clades which correspond to the majority of fern diversity. We further examined how retained duplicates following WGDs compared across independent events and found that functions of retained genes were largely convergent, with processes involved in binding, responses to stimuli, and certain organelles over-represented in paralogs while processes involved in transport, organelles derived from endosymbiotic events, and signaling were under-represented. To date, our study is the most comprehensive investigation of the nuclear fern phylogeny, though several avenues for future research remain unexplored. 
    more » « less
  3. Archibald, John (Ed.)
    Abstract Epigenetic processes in eukaryotes play important roles through regulation of gene expression, chromatin structure, and genome rearrangements. The roles of chromatin modification (e.g., DNA methylation and histone modification) and non-protein-coding RNAs have been well studied in animals and plants. With the exception of a few model organisms (e.g., Saccharomyces and Plasmodium), much less is known about epigenetic toolkits across the remainder of the eukaryotic tree of life. Even with limited data, previous work suggested the existence of an ancient epigenetic toolkit in the last eukaryotic common ancestor. We use PhyloToL, our taxon-rich phylogenomic pipeline, to detect homologs of epigenetic genes and evaluate their macroevolutionary patterns among eukaryotes. In addition to data from GenBank, we increase taxon sampling from understudied clades of SAR (Stramenopila, Alveolata, and Rhizaria) and Amoebozoa by adding new single-cell transcriptomes from ciliates, foraminifera, and testate amoebae. We focus on 118 gene families, 94 involved in chromatin modification and 24 involved in non-protein-coding RNA processes based on the epigenetics literature. Our results indicate 1) the presence of a large number of epigenetic gene families in the last eukaryotic common ancestor; 2) differential conservation among major eukaryotic clades, with a notable paucity of genes within Excavata; and 3) punctate distribution of epigenetic gene families between species consistent with rapid evolution leading to gene loss. Together these data demonstrate the power of taxon-rich phylogenomic studies for illuminating evolutionary patterns at scales of >1 billion years of evolution and suggest that macroevolutionary phenomena, such as genome conflict, have shaped the evolution of the eukaryotic epigenetic toolkit. 
    more » « less
  4. The colonization of land by plants generated opportunities for the rise of new heterotrophic life forms, including humankind. A unique event underpinned this massive change to earth ecosystems—the advent of eukaryotic green algae. Today, an abundant marine green algal group, the prasinophytes, alongside prasinodermophytes and nonmarine chlorophyte algae, is facilitating insights into plant developments. Genome-level data allow identification of conserved proteins and protein families with extensive modifications, losses, or gains and expansion patterns that connect to niche specialization and diversification. Here, we contextualize attributes according to Viridiplantae evolutionary relationships, starting with orthologous protein families, and then focusing on key elements with marked differentiation, resulting in patchy distributions across green algae and plants. We place attention on peptidoglycan biosynthesis, important for plastid division and walls; phytochrome photosensors that are master regulators in plants; and carbohydrate-active enzymes, essential to all manner of carbohydratebiotransformations. Together with advances in algal model systems, these areas are ripe for discovering molecular roles and innovations within and across plant and algal lineages. 
    more » « less
  5. Abstract

    Several protein families participate in the biogenesis and function of small RNAs (sRNAs) in plants. Those with primary roles include Dicer-like (DCL), RNA-dependent RNA polymerase (RDR), and Argonaute (AGO) proteins. Protein families such as double-stranded RNA-binding (DRB), SERRATE (SE), and SUPPRESSION OF SILENCING 3 (SGS3) act as partners of DCL or RDR proteins. Here, we present curated annotations and phylogenetic analyses of seven sRNA pathway protein families performed on 196 species in the Viridiplantae (aka green plants) lineage. Our results suggest that the RDR3 proteins emerged earlier than RDR1/2/6. RDR6 is found in filamentous green algae and all land plants, suggesting that the evolution of RDR6 proteins coincides with the evolution of phased small interfering RNAs (siRNAs). We traced the origin of the 24-nt reproductive phased siRNA-associated DCL5 protein back to the American sweet flag (Acorus americanus), the earliest diverged, extant monocot species. Our analyses of AGOs identified multiple duplication events of AGO genes that were lost, retained, or further duplicated in subgroups, indicating that the evolution of AGOs is complex in monocots. The results also refine the evolution of several clades of AGO proteins, such as AGO4, AGO6, AGO17, and AGO18. Analyses of nuclear localization signal sequences and catalytic triads of AGO proteins shed light on the regulatory roles of diverse AGOs. Collectively, this work generates a curated and evolutionarily coherent annotation for gene families involved in plant sRNA biogenesis/function and provides insights into the evolution of major sRNA pathways.

     
    more » « less