skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Clade-specific genes and the evolutionary origin of novelty; new tools in the toolkit
Clade-specific (a.k.a. lineage-specific) genes are very common and found at all taxonomic levels and in all clades examined. They can arise by duplication of previously existing genes, which can involve partial truncations or combinations with other protein domains or regulatory sequences. They can also evolve de novo from non-coding sequences, leading to potentially truly novel protein domains. Finally, since clade-specific genes are generally defined by lack of sequence homology with other proteins, they can also arise by sequence evolution that is rapid enough that previous sequence homology can no longer be detected. In such cases, where the rapid evolution is followed by constraint, we consider them to be ontologically non-novel but likely novel at a functional level. In general, clade-specific genes have received less attention from biologists but there are increasing numbers of fascinating examples of their roles in important traits. Here we review some selected recent examples, and argue that attention to clade-specific genes is an important corrective to the focus on the conserved developmental regulatory toolkit that has been the habit of evo-devo as a field. Finally, we discuss questions that arise about the evolution of clade-specific genes, and how these might be addressed by future studies. We highlight the hy- pothesis that clade-specific genes are more likely to be involved in synapomorphies that arose in the stem group where they appeared, compared to other genes.  more » « less
Award ID(s):
1656558
PAR ID:
10340216
Author(s) / Creator(s):
Editor(s):
John Davey; Lisa Nagy; Elizabeth Jockusch; Julia Bowsher
Date Published:
Journal Name:
Seminars in cell developmental biology
ISSN:
1084-9521
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Understanding how regulatory mechanisms evolve is critical for understanding the processes that give rise to novel phenotypes. Snake venom systems represent a valuable and tractable model for testing hypotheses related to the evolution of novel regulatory networks, yet the regulatory mechanisms underlying venom production remain poorly understood. Here, we use functional genomics approaches to investigate venom regulatory architecture in the prairie rattlesnake and identify cis -regulatory sequences (enhancers and promoters), trans -regulatory transcription factors, and integrated signaling cascades involved in the regulation of snake venom genes. We find evidence that two conserved vertebrate pathways, the extracellular signal-regulated kinase and unfolded protein response pathways, were co-opted to regulate snake venom. In one large venom gene family (snake venom serine proteases), this co-option was likely facilitated by the activity of transposable elements. Patterns of snake venom gene enhancer conservation, in some cases spanning 50 million yr of lineage divergence, highlight early origins and subsequent lineage-specific adaptations that have accompanied the evolution of venom regulatory architecture. We also identify features of chromatin structure involved in venom regulation, including topologically associated domains and CTCF loops that underscore the potential importance of novel chromatin structure to coevolve when duplicated genes evolve new regulatory control. Our findings provide a model for understanding how novel regulatory systems may evolve through a combination of genomic processes, including tandem duplication of genes and regulatory sequences, cis -regulatory sequence seeding by transposable elements, and diverse transcriptional regulatory proteins controlled by a co-opted regulatory cascade. 
    more » « less
  2. Wittkopp, Patricia (Ed.)
    Abstract Genes involved in spermatogenesis tend to evolve rapidly, but we lack a clear understanding of how protein sequences and patterns of gene expression evolve across this complex developmental process. We used fluorescence-activated cell sorting (FACS) to generate expression data for early (meiotic) and late (postmeiotic) cell types across 13 inbred strains of mice (Mus) spanning ∼7 My of evolution. We used these comparative developmental data to investigate the evolution of lineage-specific expression, protein-coding sequences, and expression levels. We found increased lineage specificity and more rapid protein-coding and expression divergence during late spermatogenesis, suggesting that signatures of rapid testis molecular evolution are punctuated across sperm development. Despite strong overall developmental parallels in these components of molecular evolution, protein and expression divergences were only weakly correlated across genes. We detected more rapid protein evolution on the X chromosome relative to the autosomes, whereas X-linked gene expression tended to be relatively more conserved likely reflecting chromosome-specific regulatory constraints. Using allele-specific FACS expression data from crosses between four strains, we found that the relative contributions of different regulatory mechanisms also differed between cell types. Genes showing cis-regulatory changes were more common late in spermatogenesis, and tended to be associated with larger differences in expression levels and greater expression divergence between species. In contrast, genes with trans-acting changes were more common early and tended to be more conserved across species. Our findings advance understanding of gene evolution across spermatogenesis and underscore the fundamental importance of developmental context in molecular evolutionary studies. 
    more » « less
  3. Rosenberg, Michael (Ed.)
    Abstract Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with nonmodel species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called Taxon-Informed Adjustment of Markov Model Attributes (TIAMMAt) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and nonmodel species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR’s NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt’s flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on nonmodel species. 
    more » « less
  4. null (Ed.)
    Abstract Key message Arabidopsis pollen transcriptome analysis revealed new intergenic transcripts of unknown function, many of which are long non-coding RNAs, that may function in pollen-specific processes, including the heat stress response. Abstract The male gametophyte is the most heat sensitive of all plant tissues. In recent years, long noncoding RNAs (lncRNAs) have emerged as important components of cellular regulatory networks involved in most biological processes, including response to stress. While examining RNAseq datasets of developing and germinating Arabidopsis thaliana pollen exposed to heat stress (HS), we identified 66 novel and 246 recently annotated intergenic expressed loci (XLOCs) of unknown function, with the majority encoding lncRNAs. Comparison with HS in cauline leaves and other RNAseq experiments indicated that 74% of the 312 XLOCs are pollen-specific, and at least 42% are HS-responsive. Phylogenetic analysis revealed that 96% of the genes evolved recently in Brassicaceae . We found that 50 genes are putative targets of microRNAs and that 30% of the XLOCs contain small open reading frames (ORFs) with homology to protein sequences. Finally, RNAseq of ribosome-protected RNA fragments together with predictions of periodic footprint of the ribosome P-sites indicated that 23 of these ORFs are likely to be translated. Our findings indicate that many of the 312 unknown genes might be functional and play a significant role in pollen biology, including the HS response. 
    more » « less
  5. Abstract The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well‐sampled, recently diversified, clades. One such clade is the model genusNeurospora, members of which lack recent gene duplications. SeveralNeurosporaspecies are comprehensively characterized organisms apt for studying the evolution of lineage‐specific genes (LSGs). Using gene synteny, we documented that 78% ofNeurosporaLSG clusters are located adjacent to the telomeres featuring extensive tracts of non‐coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co‐regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSGmas‐1, a gene with roles in cell‐wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a “rummage region” in theN. crassagenome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non‐coding sequences. 
    more » « less