skip to main content


Title: Diverse Evolution in 111 Plant Genomes Reveals Purifying and Dosage Balancing Selection Models for F-Box Genes
The F-box proteins function as substrate receptors to determine the specificity of Skp1-Cul1-F-box ubiquitin ligases. Genomic studies revealed large and diverse sizes of the F-box gene superfamily across plant species. Our previous studies suggested that the plant F-box gene superfamily is under genomic drift evolution promoted by epigenomic programming. However, how the size of the superfamily drifts across plant genomes is currently unknown. Through a large-scale genomic and phylogenetic comparison of the F-box gene superfamily covering 110 green plants and one red algal species, I discovered four distinct groups of plant F-box genes with diverse evolutionary processes. While the members in Clusters 1 and 2 are species/lineage-specific, those in Clusters 3 and 4 are present in over 46 plant genomes. Statistical modeling suggests that F-box genes from the former two groups are skewed toward fewer species and more paralogs compared to those of the latter two groups whose presence frequency and sizes in plant genomes follow a random statistical model. The enrichment of known Arabidopsis F-box genes in Clusters 3 and 4, along with comprehensive biochemical evidence showing that Arabidopsis members in Cluster 4 interact with the Arabidopsis Skp1-like 1 (ASK1), demonstrates over-representation of active F-box genes in these two groups. Collectively, I propose purifying and dosage balancing selection models to explain the lineage/species-specific duplications and expansions of F-box genes in plant genomes. The purifying selection model suggests that most, if not all, lineage/species-specific F-box genes are detrimental and are thus kept at low frequencies in plant genomes.  more » « less
Award ID(s):
1750361
NSF-PAR ID:
10221394
Author(s) / Creator(s):
Date Published:
Journal Name:
International Journal of Molecular Sciences
Volume:
22
Issue:
2
ISSN:
1422-0067
Page Range / eLocation ID:
871
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Protein degradation through the Ubiquitin (Ub)-26S Proteasome System (UPS) is a major gene expression regulatory pathway in plants. In this pathway, the 76-amino acid Ub proteins are covalently linked onto a large array of UPS substrates with the help of three enzymes (E1 activating, E2 conjugating, and E3 ligating enzymes) and direct them for turnover in the 26S proteasome complex. The S-phase Kinase-associated Protein 1 (Skp1), CUL1, F-box (FBX) protein (SCF) complexes have been identified as the largest E3 ligase group in plants due to the dramatic number expansion of the FBX genes in plant genomes. Since it is the FBX proteins that recognize and determine the specificity of SCF substrates, much effort has been done to characterize their genomic, physiological, and biochemical roles in the past two decades of functional genomic studies. However, the sheer size and high sequence diversity of the FBX gene family demands new approaches to uncover unknown functions. In this work, we first identified 82 known FBX members that have been functionally characterized up to date in Arabidopsis thaliana . Through comparing the genomic structure, evolutionary selection, expression patterns, domain compositions, and functional activities between known and unknown FBX gene members, we developed a neural network machine learning approach to predict whether an unknown FBX member is likely functionally active in Arabidopsis, thereby facilitating its future functional characterization. 
    more » « less
  2. The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes. 
    more » « less
  3. Abstract

    Protein ubiquitylation is a post-translational modification (PTM) process that covalently modifies a protein substrate with either mono-ubiquitin moieties or poly-ubiquitin chains often at the lysine residues. In Arabidopsis, bioinformatic predictions have suggested that over 5% of its proteome constitutes the protein ubiquitylation system. Despite advancements in functional genomic studies in plants, only a small fraction of this bioinformatically predicted system has been functionally characterized. To expand our understanding about the regulatory function of protein ubiquitylation to that rivalling several other major systems, such as transcription regulation and epigenetics, I describe the status, issues, and new approaches of protein ubiquitylation studies in plant biology. I summarize the methods utilized in defining the ubiquitylation machinery by bioinformatics, identifying ubiquitylation substrates by proteomics, and characterizing the ubiquitin E3 ligase-substrate pathways by functional genomics. Based on the functional and evolutionary analyses of the F-box gene superfamily, I propose a deleterious duplication model for the large expansion of this family in plant genomes. Given this model, I present new perspectives of future functional genomic studies on the plant ubiquitylation system to focus on core and active groups of ubiquitin E3 ligase genes.

     
    more » « less
  4. Genome sequencing has uncovered tremendous sequence variation within and between species. In plants, in addition to large variations in genome size, a great deal of sequence polymorphism is also evident in several large multi-gene families, including those involved in the ubiquitin-26S proteasome protein degradation system. However, the biological function of this sequence variation is yet not clear. In this work, we explicitly demonstrated a single origin of retroposed Arabidopsis Skp1-Like ( ASK ) genes using an improved phylogenetic analysis. Taking advantage of the 1,001 genomes project, we here provide several lines of polymorphism evidence showing both adaptive and degenerative evolutionary processes in ASK genes. Yeast two-hybrid quantitative interaction assays further suggested that recent neutral changes in the ASK2 coding sequence weakened its interactions with some F-box proteins. The trend that highly polymorphic upstream regions of ASK1 yield high levels of expression implied negative expression regulation of ASK1 by an as-yet-unknown transcriptional suppression mechanism, which may contribute to the polymorphic roles of Skp1-CUL1-F-box complexes. Taken together, this study provides new evolutionary evidence to guide future functional genomic studies of SCF-mediated protein ubiquitylation. 
    more » « less
  5. Genome amplification and sequence divergence provides raw materials to allow organismal adaptation. This is exemplified by the large expansion of the ubiquitin-26S proteasome system (UPS) in land plants, which primarily rely on intracellular signaling and biochemical metabolism to combat biotic and abiotic stresses. While a handful of functional genomic studies have demonstrated the adaptive role of the UPS in plant growth and development, many UPS members remain unknown. In this work, we applied a comparative genomic study to address the functional divergence of the UPS at a systematic level. We first used a closing-target-trimming annotation approach to identify most, if not all, UPS members in six species from each of two evolutionarily distant plant families, Brassicaceae and Poaceae. To reduce age-related errors, the two groups of species were selected based on their similar chronological order of speciation. Through size comparison, chronological expansion inference, evolutionary selection analyses, duplication mechanism prediction, and functional domain enrichment assays, we discovered significant diversities within the UPS, particularly between members from its three largest ubiquitin ligase gene families, the F-box (FBX), the Really Interesting New Gene (RING), and the Bric-a-Brac/Tramtrack/Broad Complex (BTB) families, between Brassicaceae and Poaceae. Uncovering independent Arabidopsis and Oryza genus–specific subclades of the 26S proteasome subunits from a comprehensive phylogenetic analysis further supported a diversifying evolutionary model of the UPS in these two genera, confirming its role in plant adaptation. 
    more » « less