skip to main content


Title: Green plant genomes: What we know in an era of rapidly expanding opportunities
Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future.  more » « less
Award ID(s):
1943371
NSF-PAR ID:
10323074
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
4
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Societal Impact Statement Summary

    Plant genomes exhibit spectacular diversity in size, composition, and complexity, and although we suspect that this diversity is related to the equally spectacular diversity of plant form and function, this link is still poorly understood. Plant genomes carry signatures of evolutionary history, whole‐genome duplication, population processes, and more, and we are just learning how to read this historical information from appropriate genetic markers. But plant genomes are not merely chroniclers of past evolutionary change: they are dynamic, evolving entities in their own right, driving changes in plant chemistry, morphology, ecology, and more. Here, we describe how plant genomes have been harnessed for studies of plant phylogeny and diversification, with examples spanning all green plants, a clade of nearly half a million species spanning nearly a billion years of evolutionary time. Then focusing on angiosperms, we suggest how the process of whole‐genome duplication (polyploidy) has driven, and continues to drive, major innovations in morphology, stress response, and more. Together, these perspectives will begin to reveal how genomic change can lead to novelty and diversity at the organismal level. Finally, we review how little we actually know about plant genomes, given that assembled genome sequences exist for fewer than 1% of all plant species—a major shortcoming as we seek to meet societal challenges of food security, the need for new medicines, and conservation of species in response to climate change.

     
    more » « less
  2. The F-box proteins function as substrate receptors to determine the specificity of Skp1-Cul1-F-box ubiquitin ligases. Genomic studies revealed large and diverse sizes of the F-box gene superfamily across plant species. Our previous studies suggested that the plant F-box gene superfamily is under genomic drift evolution promoted by epigenomic programming. However, how the size of the superfamily drifts across plant genomes is currently unknown. Through a large-scale genomic and phylogenetic comparison of the F-box gene superfamily covering 110 green plants and one red algal species, I discovered four distinct groups of plant F-box genes with diverse evolutionary processes. While the members in Clusters 1 and 2 are species/lineage-specific, those in Clusters 3 and 4 are present in over 46 plant genomes. Statistical modeling suggests that F-box genes from the former two groups are skewed toward fewer species and more paralogs compared to those of the latter two groups whose presence frequency and sizes in plant genomes follow a random statistical model. The enrichment of known Arabidopsis F-box genes in Clusters 3 and 4, along with comprehensive biochemical evidence showing that Arabidopsis members in Cluster 4 interact with the Arabidopsis Skp1-like 1 (ASK1), demonstrates over-representation of active F-box genes in these two groups. Collectively, I propose purifying and dosage balancing selection models to explain the lineage/species-specific duplications and expansions of F-box genes in plant genomes. The purifying selection model suggests that most, if not all, lineage/species-specific F-box genes are detrimental and are thus kept at low frequencies in plant genomes. 
    more » « less
  3. Abstract

    Ferns are notorious for possessing large genomes and numerous chromosomes. Despite decades of speculation, the processes underlying the expansive genomes of ferns are unclear, largely due to the absence of a sequenced homosporous fern genome. The lack of this crucial resource has not only hindered investigations of evolutionary processes responsible for the unusual genome characteristics of homosporous ferns, but also impeded synthesis of genome evolution across land plants. Here, we used the model fern speciesCeratopteris richardiito address the processes (e.g., polyploidy, spread of repeat elements) by which the large genomes and high chromosome numbers typical of homosporous ferns may have evolved and have been maintained. We directly compared repeat compositions in species spanning the green plant tree of life and a diversity of genome sizes, as well as both short- and long-read-based assemblies ofCeratopteris. We found evidence consistent with a single ancient polyploidy event in the evolutionary history ofCeratopterisbased on both genomic and cytogenetic data, and on repeat proportions similar to those found in large flowering plant genomes. This study provides a major stepping-stone in the understanding of land plant evolutionary genomics by providing the first homosporous fern reference genome, as well as insights into the processes underlying the formation of these massive genomes.

     
    more » « less
  4. The arbuscular mycorrhizal fungi (AMFs) are obligate root symbionts in the subphylum Glomeromycotina that can benefit land plants by increasing their soil nutrient uptake in exchange for photosynthetically fixed carbon sources. To date, annotated genome data from representatives of the AMF orders Glomerales, Diversisporales and Archaeosporales have shown that these organisms have large and highly repeated genomes, and no genes to produce sugars and fatty acids. This led to the hypothesis that the most recent common ancestor (MRCA) of Glomeromycotina was fully dependent on plants for nutrition. Here, we aimed to further test this hypothesis by obtaining annotated genome data from a member of the early diverging order Paraglomerales ( Paraglomus occultum ). Genome analyses showed this species carries a 39.6 Mb genome and considerably fewer genes and repeats compared to most AMF relatives with annotated genomes. Consistent with phylogenies based on ribosomal genes, our phylogenetic analyses suggest P. occultum as the earliest diverged branch within Glomeromycotina. Overall, our analyses support the view that the MRCA of Glomeromycotina carried hallmarks of obligate plant biotrophy. The small genome size and content of P. occultum could either reflect adaptive reductive processes affecting some early AMF lineages, or indicate that the high gene and repeat family diversity thought to drive AMF adaptability to host and environmental change was not an ancestral feature of these prominent plant symbionts. 
    more » « less
  5. Green plants (Viridiplantae) include around 450,000–500,000 species of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life. 
    more » « less