Previous evolutionary models of duplicate gene evolution have overlooked the pivotal role of genome architecture. Here, we show that proximity-based regulatory recruitment by distally duplicated genes is an efficient mechanism for modulating tissue-specific production of preexisting proteins. By leveraging genomic asymmetries, we performed a coexpression analysis onDrosophila melanogastertissue data to show the generality of enhancer capture-divergence (ECD) as a significant evolutionary driver of asymmetric, distally duplicated genes. We use the recently evolved geneHP6/Umbreaas an example of the ECD process. By assaying genome-wide chromosomal conformations in multipleDrosophilaspecies, we show thatHP6/Umbreawas inserted near a preexisting, long-distance three-dimensional genomic interaction. We then use this data to identify a newly found enhancer (FLEE1), buried within the coding region of the highly conserved, essential geneMFS18, that likely neofunctionalizedHP6/Umbrea. Last, we demonstrate ancestral transcriptional coregulation ofHP6/Umbrea’s future insertion site, illustrating how enhancer capture provides a highly evolvable, one-step solution to Ohno’s dilemma.
more »
« less
Origins of lineage‐specific elements via gene duplication, relocation, and regional rearrangement in Neurospora crassa
Abstract The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well‐sampled, recently diversified, clades. One such clade is the model genusNeurospora, members of which lack recent gene duplications. SeveralNeurosporaspecies are comprehensively characterized organisms apt for studying the evolution of lineage‐specific genes (LSGs). Using gene synteny, we documented that 78% ofNeurosporaLSG clusters are located adjacent to the telomeres featuring extensive tracts of non‐coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co‐regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSGmas‐1, a gene with roles in cell‐wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a “rummage region” in theN. crassagenome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non‐coding sequences.
more »
« less
- PAR ID:
- 10559425
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Molecular Ecology
- Volume:
- 33
- Issue:
- 24
- ISSN:
- 0962-1083
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non‐homology gene features. Among the eight supervised classification algorithms evaluated, random‐forest‐based prediction consistently provided the most accurate gene function prediction. Non‐homology‐based functional annotation provides complementary strengths to homology‐based annotation, with higher average performance in Biological Process GO terms, the domain where homology‐based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology‐based functional annotation is highest. GO prediction models trained with homology‐based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non‐homology‐based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology‐based functional annotations.more » « less
-
Abstract The evolutionary direction of gonochorism and hermaphroditism is an intriguing mystery to be solved. The special transient hermaphroditic stage makes the little yellow croaker (Larimichthys polyactis) an appealing model for studying hermaphrodite formation. However, the origin and evolutionary relationship between ofL. polyactisandLarimichthys crocea, the most famous commercial fish species in East Asia, remain unclear. Here, we report the sequence of theL. polyactisgenome, which we found is ~706 Mb long (contig N50 = 1.21 Mb and scaffold N50 = 4.52 Mb) and contains 25,233 protein‐coding genes. Phylogenomic analysis suggested thatL. polyactisdiverged from the common ancestor,L. crocea, approximately 25.4 million years ago. Our high‐quality genome assembly enabled comparative genomic analysis, which revealed several within‐chromosome rearrangements and translocations, without major chromosome fission or fusion events between the two species. Thedmrt1gene was identified as the male‐specific gene inL. polyactis. Transcriptome analysis showed that the expression ofdmrt1and its upstream regulatory gene (rnf183) were both sexually dimorphic.Rnf183, unlike its two paraloguesrnf223andrnf225, is only present inLarimichthysandLatesbut not in other teleost species, suggesting that it originated from lineage‐specific duplication or was lost in other teleosts.Phylogenetic analysis shows that the hermaphrodite stage in maleL. polyactismay be explained by the sequence evolution ofdmrt1. Decoding theL. polyactisgenome not only provides insight into the genetic underpinnings of hermaphrodite evolution, but also provides valuable information for enhancing fish aquaculture.more » « less
-
Abstract The discovery of cancer driver mutations is a fundamental goal in cancer research. While many cancer driver mutations have been discovered in the protein-coding genome, research into potential cancer drivers in the non-coding regions showed limited success so far. Here, we present a novel comprehensive framework Dr.Nod for detection of non-coding cis-regulatory candidate driver mutations that are associated with dysregulated gene expression using tissue-matched enhancer-gene annotations. Applying the framework to data from over 1500 tumours across eight tissues revealed a 4.4-fold enrichment of candidate driver mutations in regulatory regions of known cancer driver genes. An overarching conclusion that emerges is that the non-coding driver mutations contribute to cancer by significantly altering transcription factor binding sites, leading to upregulation of tissue-matched oncogenes and down-regulation of tumour-suppressor genes. Interestingly, more than half of the detected cancer-promoting non-coding regulatory driver mutations are over 20 kb distant from the cancer-associated genes they regulate. Our results show the importance of tissue-matched enhancer-gene maps, functional impact of mutations, and complex background mutagenesis model for the prediction of non-coding regulatory drivers. In conclusion, our study demonstrates that non-coding mutations in enhancers play a previously underappreciated role in cancer and dysregulation of clinically relevant target genes.more » « less
-
Abstract The phyla Nitrospirota and Nitrospinota have received significant research attention due to their unique nitrogen metabolisms important to biogeochemical and industrial processes. These phyla are common inhabitants of marine and terrestrial subsurface environments and contain members capable of diverse physiologies in addition to nitrite oxidation and complete ammonia oxidation. Here, we use phylogenomics and gene-based analysis with ancestral state reconstruction and gene-tree–species-tree reconciliation methods to investigate the life histories of these two phyla. We find that basal clades of both phyla primarily inhabit marine and terrestrial subsurface environments. The genomes of basal clades in both phyla appear smaller and more densely coded than the later-branching clades. The extant basal clades of both phyla share many traits inferred to be present in their respective common ancestors, including hydrogen, one-carbon, and sulfur-based metabolisms. Later-branching groups, namely the more frequently studied classes Nitrospiria and Nitrospinia, are both characterized by genome expansions driven by either de novo origination or laterally transferred genes that encode functions expanding their metabolic repertoire. These expansions include gene clusters that perform the unique nitrogen metabolisms that both phyla are most well known for. Our analyses support replicated evolutionary histories of these two bacterial phyla, with modern subsurface environments representing a genomic repository for the coding potential of ancestral metabolic traits.more » « less