Abstract Gene duplication is a fundamental part of evolutionary innovation. While single-gene duplications frequently exhibit asymmetric evolutionary rates between paralogs, the extent to which this applies to multi-gene duplications remains unclear. In this study, we investigate the role of genetic context in shaping evolutionary divergence within multi-gene duplications, leveraging microsynteny to differentiate source and target copies. Using a dataset of 193 mammalian genome assemblies and a bird outgroup, we systematically analyze patterns of sequence divergence between duplicated genes and reference orthologs. We find that target copies, those relocated to new genomic environments, exhibit elevated evolutionary rates compared to source copies in the ancestral location. This asymmetry is influenced by the distance between copies and the size of the target copy. We also demonstrate that the polarization of rate asymmetry in paralogs, the “choice” of the slowly evolving copy, is biased towards collective, block-wise polarization in multi-gene duplications. Our findings highlight the importance of genetic context in modulating post-duplication divergence, where differences in cis-regulatory elements and co-expressed gene clusters between source and target copies may be responsible. This study presents a large-scale test of asymmetric evolution in multi-gene duplications, offering new insight into how genome architecture shapes functional diversification of paralogs. Significance statementAfter a gene is duplicated, reduced selective constraints can lead the two copies to rapidly diverge, with one copy often evolving faster and occasionally gaining a new function. We quantify the influence of genetic context in choosing which copy of a duplicated gene has an elevated substitution rate. In a representative dataset of 193 mammalian genomes, we found strong evidence that gene copies pasted into new genomic locations tend to evolve faster than the corresponding copies in ancestral locations, suggesting an important role for the regulatory environment. The asymmetry in evolutionary rates of duplicated genes persists even for very large multigenic duplications, up to the scale of megabases, indicating that regulatory interactions frequently reach farther than previously thought.
more »
« less
This content will become publicly available on October 5, 2026
Influence of cis-regulatory elements on expression divergence in human segmental duplications
Human-specific segmental duplications (HSDs) contain millions of base pairs of sequence unique to the human genome, including genes that shape neurodevelopment. Despite their young age (<6 million years), HSD genes exhibit widespread regulatory divergence, with paralog-specific expression patterns documented across a variety of tissues and cell types. Using long-read expression and epigenomic data, we show that human-specific paralogs tend to have lower activity than the shared, ancestral ones. To systematically characterize the cis-regulatory elements (CREs) within HSDs and understand patterns of regulatory change in recently evolved gene families, we conducted a massively parallel reporter assay of 7,160 human duplicated and chimpanzee orthologous sequences in lymphoblastoid (GM12878) and neuroblastoma (SH-SY5Y) cell lines. A large proportion (14–24%) of sequences exhibited differential activity relative to the chimpanzee ortholog (or between human paralogs), mostly with small fold-differences. Combining measured activity levels across all assayed sequences, predicted differences in cis-regulatory activity correlated with mRNA levels in SH-SY5Y. Differentially active CREs were validated for CHRFAM7A, HYDIN2, and SRGAP2C that may contribute to paralog-specific expression patterns and thereby to human-specific traits. While we find some changes in CRE activity shared between duplicate paralogs likely driving regulatory divergence in gene expression, consideration of non-shared adjacent sequences to duplications suggests a larger role for altered genome positional effects. In all, this work suggests that functional divergence of duplicated CREs contributes moderately to regulatory divergence of HSD genes and uncovers enhancers that are candidate drivers of human-specific regulatory patterns.
more »
« less
- Award ID(s):
- 2145885
- PAR ID:
- 10654854
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Transcriptional divergence of duplicated genes after whole genome duplication (WGD) has been described in many plant lineages and is often associated with subgenome dominance, a genome-wide mechanism. However, it is unknown what underlies the transcriptional divergence of duplicated genes in polyploid species that lack subgenome dominance. Soybean is a paleotetraploid with a WGD that occurred 5 to 13 Mya. Approximately 50% of the duplicated genes retained from this WGD exhibit transcriptional divergence. We developed accessible chromatin region (ACR) datasets from leaf, flower, and seed tissues using MNase-hypersensitivity sequencing. We validated enhancer function of several ACRs associated with known genes using CRISPR/Cas9-mediated genome editing. The ACR datasets were used to examine and correlate the transcriptional patterns of 17,111 pairs of duplicated genes in different tissues. We demonstrate that ACR dynamics are correlated with divergence of both expression level and tissue specificity of individual gene pairs. Gain or loss of flanking ACRs and mutation ofcis-regulatory elements (CREs) within the ACRs can change the balance of the expression level and/or tissue specificity of the duplicated genes. Analysis of DNA sequences associated with ACRs revealed that the extensive sequence rearrangement after the WGD reshaped the CRE landscape, which appears to play a key role in the transcriptional divergence of duplicated genes in soybean. This may represent a general mechanism for transcriptional divergence of duplicated genes in polyploids that lack subgenome dominance.more » « less
-
Kopp, Artyom (Ed.)Animal traits develop through the expression and action of numerous regulatory and realizator genes that comprise a gene regulatory network (GRN). For each GRN, its underlying patterns of gene expression are controlled by cis -regulatory elements (CREs) that bind activating and repressing transcription factors. These interactions drive cell-type and developmental stage-specific transcriptional activation or repression. Most GRNs remain incompletely mapped, and a major barrier to this daunting task is CRE identification. Here, we used an in silico method to identify predicted CREs (pCREs) that comprise the GRN which governs sex-specific pigmentation of Drosophila melanogaster . Through in vivo assays, we demonstrate that many pCREs activate expression in the correct cell-type and developmental stage. We employed genome editing to demonstrate that two CREs control the pupal abdomen expression of trithorax , whose function is required for the dimorphic phenotype. Surprisingly, trithorax had no detectable effect on this GRN’s key trans -regulators, but shapes the sex-specific expression of two realizator genes. Comparison of sequences orthologous to these CREs supports an evolutionary scenario where these trithorax CREs predated the origin of the dimorphic trait. Collectively, this study demonstrates how in silico approaches can shed novel insights on the GRN basis for a trait’s development and evolution.more » « less
-
ABSTRACT The posterior end of the follicular epithelium is patterned by midline (MID) and its paralog H15, the Drosophila homologs of the mammalian Tbx20 transcription factor. We have previously identified two cis-regulatory modules (CRMs) that recapitulate the endogenous pattern of mid in the follicular epithelium. Here, using CRISPR/Cas9 genome editing, we demonstrate redundant activity of these mid CRMs. Although the deletion of either CRM alone generated marginal change in mid expression, the deletion of both CRMs reduced expression by 60%. Unexpectedly, the deletion of the 5′ proximal CRM of mid eliminated H15 expression. Interestingly, expression of these paralogs in other tissues remained unaffected in the CRM deletion backgrounds. These results suggest that the paralogs are regulated by a shared CRM that coordinates gene expression during posterior fate determination. The consistent overlapping expression of mid and H15 in various tissues may indicate that the paralogs could also be under shared regulation by other CRMs in these tissues.more » « less
-
Abstract Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in the body1–3. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for these intended purposes has arisen naturally. Here we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell-type specificity. We take advantage of innovations in deep neural network modelling of CRE activity across three cell types, efficient in silico optimization and massively parallel reporter assays to design and empirically test thousands of CREs4–8. Through large-scale in vitro validation, we show that synthetic sequences are more effective at driving cell-type-specific expression in three cell lines compared with natural sequences from the human genome and achieve specificity in analogous tissues when tested in vivo. Synthetic sequences exhibit distinct motif vocabulary associated with activity in the on-target cell type and a simultaneous reduction in the activity of off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs from massively parallel reporter assay models and demonstrate the required literacy to write fit-for-purpose regulatory code.more » « less
An official website of the United States government
