skip to main content


Title: Closing Target Trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes.  more » « less
Award ID(s):
1750361
NSF-PAR ID:
10101155
Author(s) / Creator(s):
;
Date Published:
Journal Name:
PloS one
Volume:
14
ISSN:
1932-6203
Page Range / eLocation ID:
e0209468
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The F-box proteins function as substrate receptors to determine the specificity of Skp1-Cul1-F-box ubiquitin ligases. Genomic studies revealed large and diverse sizes of the F-box gene superfamily across plant species. Our previous studies suggested that the plant F-box gene superfamily is under genomic drift evolution promoted by epigenomic programming. However, how the size of the superfamily drifts across plant genomes is currently unknown. Through a large-scale genomic and phylogenetic comparison of the F-box gene superfamily covering 110 green plants and one red algal species, I discovered four distinct groups of plant F-box genes with diverse evolutionary processes. While the members in Clusters 1 and 2 are species/lineage-specific, those in Clusters 3 and 4 are present in over 46 plant genomes. Statistical modeling suggests that F-box genes from the former two groups are skewed toward fewer species and more paralogs compared to those of the latter two groups whose presence frequency and sizes in plant genomes follow a random statistical model. The enrichment of known Arabidopsis F-box genes in Clusters 3 and 4, along with comprehensive biochemical evidence showing that Arabidopsis members in Cluster 4 interact with the Arabidopsis Skp1-like 1 (ASK1), demonstrates over-representation of active F-box genes in these two groups. Collectively, I propose purifying and dosage balancing selection models to explain the lineage/species-specific duplications and expansions of F-box genes in plant genomes. The purifying selection model suggests that most, if not all, lineage/species-specific F-box genes are detrimental and are thus kept at low frequencies in plant genomes. 
    more » « less
  2. Summary

    Expansins comprise a superfamily of plant cell wall loosening proteins that can be divided into four individual families (EXPA, EXPB, EXLA and EXLB). Aside from inferred roles in a variety of plant growth and developmental traits, little is known regarding the function of specific expansin clades, for which there are at least 16 in flowering plants (angiosperms); however, there is evidence to suggest that some expansins have cell‐specific functions, in root hair and pollen tube development, for example. Recently, two duckweed genomes have been sequenced (Spirodela polyrhizastrains 7498 and 9509), revealing significantly reduced superfamily sizes. We hypothesized that there would be a correlation between expansin loss and morphological reductions seen among highly adapted aquatic species. In order to provide an answer to this question, we characterized the expansin superfamilies of the greater duckweedSpirodela, the marine eelgrassZostera marinaand the bladderwortUtricularia gibba.We discovered rampant expansin gene and clade loss among the three, including a complete absence of the EXLB family and EXPA‐VII. The most convincing correlation between morphological reduction and expansin loss was seen forUtriculariaandSpirodela, which both lack root hairs and the root hair expansin clade EXPA‐X. Contrary to the pattern observed in other species, fourUtriculariaexpansins failed to branch within any clade, suggesting that they may be the result of neofunctionalization. Last, an expansin clade previously discovered only in eudicots was identified inSpirodela, allowing us to conclude that the last common ancestor of monocots and eudicots contained a minimum of 17 expansins.

     
    more » « less
  3. null (Ed.)
    Abstract Background The western flower thrips, Frankliniella occidentalis (Pergande), is a globally invasive pest and plant virus vector on a wide array of food, fiber, and ornamental crops. The underlying genetic mechanisms of the processes governing thrips pest and vector biology, feeding behaviors, ecology, and insecticide resistance are largely unknown. To address this gap, we present the F. occidentalis draft genome assembly and official gene set. Results We report on the first genome sequence for any member of the insect order Thysanoptera. Benchmarking Universal Single-Copy Ortholog (BUSCO) assessments of the genome assembly (size = 415.8 Mb, scaffold N50 = 948.9 kb) revealed a relatively complete and well-annotated assembly in comparison to other insect genomes. The genome is unusually GC-rich (50%) compared to other insect genomes to date. The official gene set (OGS v1.0) contains 16,859 genes, of which ~ 10% were manually verified and corrected by our consortium. We focused on manual annotation, phylogenetic, and expression evidence analyses for gene sets centered on primary themes in the life histories and activities of plant-colonizing insects. Highlights include the following: (1) divergent clades and large expansions in genes associated with environmental sensing (chemosensory receptors) and detoxification (CYP4, CYP6, and CCE enzymes) of substances encountered in agricultural environments; (2) a comprehensive set of salivary gland genes supported by enriched expression; (3) apparent absence of members of the IMD innate immune defense pathway; and (4) developmental- and sex-specific expression analyses of genes associated with progression from larvae to adulthood through neometaboly, a distinct form of maturation differing from either incomplete or complete metamorphosis in the Insecta. Conclusions Analysis of the F. occidentalis genome offers insights into the polyphagous behavior of this insect pest that finds, colonizes, and survives on a widely diverse array of plants. The genomic resources presented here enable a more complete analysis of insect evolution and biology, providing a missing taxon for contemporary insect genomics-based analyses. Our study also offers a genomic benchmark for molecular and evolutionary investigations of other Thysanoptera species. 
    more » « less
  4. Genome amplification and sequence divergence provides raw materials to allow organismal adaptation. This is exemplified by the large expansion of the ubiquitin-26S proteasome system (UPS) in land plants, which primarily rely on intracellular signaling and biochemical metabolism to combat biotic and abiotic stresses. While a handful of functional genomic studies have demonstrated the adaptive role of the UPS in plant growth and development, many UPS members remain unknown. In this work, we applied a comparative genomic study to address the functional divergence of the UPS at a systematic level. We first used a closing-target-trimming annotation approach to identify most, if not all, UPS members in six species from each of two evolutionarily distant plant families, Brassicaceae and Poaceae. To reduce age-related errors, the two groups of species were selected based on their similar chronological order of speciation. Through size comparison, chronological expansion inference, evolutionary selection analyses, duplication mechanism prediction, and functional domain enrichment assays, we discovered significant diversities within the UPS, particularly between members from its three largest ubiquitin ligase gene families, the F-box (FBX), the Really Interesting New Gene (RING), and the Bric-a-Brac/Tramtrack/Broad Complex (BTB) families, between Brassicaceae and Poaceae. Uncovering independent Arabidopsis and Oryza genus–specific subclades of the 26S proteasome subunits from a comprehensive phylogenetic analysis further supported a diversifying evolutionary model of the UPS in these two genera, confirming its role in plant adaptation. 
    more » « less
  5. The first genome sequenced of a eukaryotic organism was for Saccharomyces cerevisiae, as reported in 1996, but it was more than 10 years before any of the zygomycete fungi, which are the early-diverging terrestrial fungi currently placed in the phyla Mucoromycota and Zoopagomycota, were sequenced. The genome for Rhizopus delemar was completed in 2008; currently, more than 1000 zygomycete genomes have been sequenced. Genomic data from these early-diverging terrestrial fungi revealed deep phylogenetic separation of the two major clades—primarily plant—associated saprotrophic and mycorrhizal Mucoromycota versus the primarily mycoparasitic or animal-associated parasites and commensals in the Zoopagomycota. Genomic studies provide many valuable insights into how these fungi evolved in response to the challenges of living on land, including adaptations to sensing light and gravity, development of hyphal growth, and co-existence with the first terrestrial plants. Genome sequence data have facilitated studies of genome architecture, including a history of genome duplications and horizontal gene transfer events, distribution and organization of mating type loci, rDNA genes and transposable elements, methylation processes, and genes useful for various industrial applications. Pathogenicity genes and specialized secondary metabolites have also been detected in soil saprobes and pathogenic fungi. Novel endosymbiotic bacteria and viruses have been discovered during several zygomycete genome projects. Overall, genomic information has helped to resolve a plethora of research questions, from the placement of zygomycetes on the evolutionary tree of life and in natural ecosystems, to the applied biotechnological and medical questions.

     
    more » « less