skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 4, 2026

Title: Quantifying the influence of genetic context on duplicated mammalian genes
Abstract Gene duplication is a fundamental part of evolutionary innovation. While single-gene duplications frequently exhibit asymmetric evolutionary rates between paralogs, the extent to which this applies to multi-gene duplications remains unclear. In this study, we investigate the role of genetic context in shaping evolutionary divergence within multi-gene duplications, leveraging microsynteny to differentiate source and target copies. Using a dataset of 193 mammalian genome assemblies and a bird outgroup, we systematically analyze patterns of sequence divergence between duplicated genes and reference orthologs. We find that target copies, those relocated to new genomic environments, exhibit elevated evolutionary rates compared to source copies in the ancestral location. This asymmetry is influenced by the distance between copies and the size of the target copy. We also demonstrate that the polarization of rate asymmetry in paralogs, the “choice” of the slowly evolving copy, is biased towards collective, block-wise polarization in multi-gene duplications. Our findings highlight the importance of genetic context in modulating post-duplication divergence, where differences in cis-regulatory elements and co-expressed gene clusters between source and target copies may be responsible. This study presents a large-scale test of asymmetric evolution in multi-gene duplications, offering new insight into how genome architecture shapes functional diversification of paralogs. Significance statementAfter a gene is duplicated, reduced selective constraints can lead the two copies to rapidly diverge, with one copy often evolving faster and occasionally gaining a new function. We quantify the influence of genetic context in choosing which copy of a duplicated gene has an elevated substitution rate. In a representative dataset of 193 mammalian genomes, we found strong evidence that gene copies pasted into new genomic locations tend to evolve faster than the corresponding copies in ancestral locations, suggesting an important role for the regulatory environment. The asymmetry in evolutionary rates of duplicated genes persists even for very large multigenic duplications, up to the scale of megabases, indicating that regulatory interactions frequently reach farther than previously thought.  more » « less
Award ID(s):
2019745
PAR ID:
10595434
Author(s) / Creator(s):
; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Human-specific segmental duplications (HSDs) contain millions of base pairs of sequence unique to the human genome, including genes that shape neurodevelopment. Despite their young age (<6 million years), HSD genes exhibit widespread regulatory divergence, with paralog-specific expression patterns documented across a variety of tissues and cell types. Using long-read expression and epigenomic data, we show that human-specific paralogs tend to have lower activity than the shared, ancestral ones. To systematically characterize the cis-regulatory elements (CREs) within HSDs and understand patterns of regulatory change in recently evolved gene families, we conducted a massively parallel reporter assay of 7,160 human duplicated and chimpanzee orthologous sequences in lymphoblastoid (GM12878) and neuroblastoma (SH-SY5Y) cell lines. A large proportion (14–24%) of sequences exhibited differential activity relative to the chimpanzee ortholog (or between human paralogs), mostly with small fold-differences. Combining measured activity levels across all assayed sequences, predicted differences in cis-regulatory activity correlated with mRNA levels in SH-SY5Y. Differentially active CREs were validated for CHRFAM7A, HYDIN2, and SRGAP2C that may contribute to paralog-specific expression patterns and thereby to human-specific traits. While we find some changes in CRE activity shared between duplicate paralogs likely driving regulatory divergence in gene expression, consideration of non-shared adjacent sequences to duplications suggests a larger role for altered genome positional effects. In all, this work suggests that functional divergence of duplicated CREs contributes moderately to regulatory divergence of HSD genes and uncovers enhancers that are candidate drivers of human-specific regulatory patterns. 
    more » « less
  2. Synopsis Gene duplicates, or paralogs, serve as a major source of new genetic material and comprise seeds for evolutionary innovation. While originally thought to be quickly lost or nonfunctionalized following duplication, now a vast number of paralogs are known to be retained in a functional state. Daughter paralogs can provide robustness through redundancy, specialize via sub-functionalization, or neo-functionalize to play new roles. Indeed, the duplication and divergence of developmental genes have played a monumental role in the evolution of animal forms (e.g., Hox genes). Still, despite their prevalence and evolutionary importance, the precise detection of gene duplicates in newly sequenced genomes remains technically challenging and often overlooked. This presents an especially pertinent problem for evolutionary developmental biology, where hypothesis testing requires accurate detection of changes in gene expression and function, often in nontraditional model species. Frequently, these analyses rely on molecular reagents designed within coding sequences that may be highly similar in recently duplicated paralogs, leading to cross-reactivity and spurious results. Thus, care is needed to avoid erroneously assigning diverged functions of paralogs to a single gene, and potentially misinterpreting evolutionary history. This perspective aims to overview the prevalence and importance of paralogs and to shed light on the difficulty of their detection and analysis while offering potential solutions. 
    more » « less
  3. Abstract A signaling complex comprising members of the LORELEI (LRE)-LIKE GPI-anchored protein (LLG) and Catharanthus roseus RECEPTOR-LIKE KINASE 1-LIKE (CrRLK1L) families perceive RAPID ALKALINIZATION FACTOR (RALF) peptides and regulate growth, reproduction, immunity, and stress responses in Arabidopsis (Arabidopsis thaliana). Genes encoding these proteins are members of multigene families in most angiosperms and could generate thousands of signaling complex variants. However, the links between expansion of these gene families and the functional diversification of this critical signaling complex as well as the evolutionary factors underlying the maintenance of gene duplicates remain unknown. Here, we investigated LLG gene family evolution by sampling land plant genomes and explored the function and expression of angiosperm LLGs. We found that LLG diversity within major land plant lineages is primarily due to lineage-specific duplication events, and that these duplications occurred both early in the history of these lineages and more recently. Our complementation and expression analyses showed that expression divergence (i.e. regulatory subfunctionalization), rather than functional divergence, explains the retention of LLG paralogs. Interestingly, all but one monocot and all eudicot species examined had an LLG copy with preferential expression in male reproductive tissues, while the other duplicate copies showed highest levels of expression in female or vegetative tissues. The single LLG copy in Amborella trichopoda is expressed vastly higher in male compared to in female reproductive or vegetative tissues. We propose that expression divergence plays an important role in retention of LLG duplicates in angiosperms. 
    more » « less
  4. Abstract Gene duplication is a source of evolutionary novelty. DNA methylation may play a role in the evolution of duplicate genes (paralogs) through its association with gene expression. While this relationship has been examined to varying extents in a few individual species, the generalizability of these results at either a broad phylogenetic scale with species of differing duplication histories or across a population remains unknown. We applied a comparative epigenomic approach to 43 angiosperm species across the phylogeny and a population of 928 Arabidopsis (Arabidopsis thaliana) accessions, examining the association of DNA methylation with paralog evolution. Genic DNA methylation was differentially associated with duplication type, the age of duplication, sequence evolution, and gene expression. Whole-genome duplicates were typically enriched for CG-only gene body methylated or unmethylated genes, while single-gene duplications were typically enriched for non-CG methylated or unmethylated genes. Non-CG methylation, in particular, was a characteristic of more recent single-gene duplicates. Core angiosperm gene families were differentiated into those which preferentially retain paralogs and “duplication-resistant” families, which convergently reverted to singletons following duplication. Duplication-resistant families that still have paralogous copies were, uncharacteristically for core angiosperm genes, enriched for non-CG methylation. Non-CG methylated paralogs had higher rates of sequence evolution, higher frequency of presence–absence variation, and more limited expression. This suggests that silencing by non-CG methylation may be important to maintaining dosage following duplication and be a precursor to fractionation. Our results indicate that genic methylation marks differing evolutionary trajectories and fates between paralogous genes and have a role in maintaining dosage following duplication. 
    more » « less
  5. Abstract Duplicated genes provide the opportunity for evolutionary novelty and adaptive divergence. In many cases, having more gene copies increases gene expression, which might facilitate adaptation to stressful or novel environments. Conversely, overexpression or misexpression of duplicated genes can be detrimental and subject to negative selection. In this scenario, newly duplicate genes may evade purifying selection if they are epigenetically silenced, at least temporarily, leading them to persist in populations as copy number variations (CNVs). In animals and plants, younger gene duplicates tend to have higher levels of DNA methylation and lower levels of gene expression, suggesting epigenetic regulation could promote the retention of gene duplications via expression repression or silencing. Here, we test the hypothesis that DNA methylation variation coincides with young duplicate genes that are segregating as CNVs in six populations of the three‐spined stickleback that span a salinity gradient from 4 to 30 PSU. Using reduced‐representation bisulfite sequencing, we found DNA methylation and CNV differentiation outliers rarely overlapped. Whereas lineage‐specific genes and young duplicates were found to be highly methylated, just two gene CNVs showed a significant association between promoter methylation level and copy number, suggesting that DNA methylation might not interact with CNVs in our dataset. If most new duplications are regulated for dosage by epigenetic mechanisms, our results do not support a strong contribution from DNA methylation soon after duplication. Instead, our results are consistent with a preference to duplicate genes that are already highly methylated. 
    more » « less