Synopsis Gene duplicates, or paralogs, serve as a major source of new genetic material and comprise seeds for evolutionary innovation. While originally thought to be quickly lost or nonfunctionalized following duplication, now a vast number of paralogs are known to be retained in a functional state. Daughter paralogs can provide robustness through redundancy, specialize via sub-functionalization, or neo-functionalize to play new roles. Indeed, the duplication and divergence of developmental genes have played a monumental role in the evolution of animal forms (e.g., Hox genes). Still, despite their prevalence and evolutionary importance, the precise detection of gene duplicates in newly sequenced genomes remains technically challenging and often overlooked. This presents an especially pertinent problem for evolutionary developmental biology, where hypothesis testing requires accurate detection of changes in gene expression and function, often in nontraditional model species. Frequently, these analyses rely on molecular reagents designed within coding sequences that may be highly similar in recently duplicated paralogs, leading to cross-reactivity and spurious results. Thus, care is needed to avoid erroneously assigning diverged functions of paralogs to a single gene, and potentially misinterpreting evolutionary history. This perspective aims to overview the prevalence and importance of paralogs and to shed light on the difficulty of their detection and analysis while offering potential solutions. 
                        more » 
                        « less   
                    This content will become publicly available on April 4, 2026
                            
                            Quantifying the influence of genetic context on duplicated mammalian genes
                        
                    
    
            Abstract Gene duplication is a fundamental part of evolutionary innovation. While single-gene duplications frequently exhibit asymmetric evolutionary rates between paralogs, the extent to which this applies to multi-gene duplications remains unclear. In this study, we investigate the role of genetic context in shaping evolutionary divergence within multi-gene duplications, leveraging microsynteny to differentiate source and target copies. Using a dataset of 193 mammalian genome assemblies and a bird outgroup, we systematically analyze patterns of sequence divergence between duplicated genes and reference orthologs. We find that target copies, those relocated to new genomic environments, exhibit elevated evolutionary rates compared to source copies in the ancestral location. This asymmetry is influenced by the distance between copies and the size of the target copy. We also demonstrate that the polarization of rate asymmetry in paralogs, the “choice” of the slowly evolving copy, is biased towards collective, block-wise polarization in multi-gene duplications. Our findings highlight the importance of genetic context in modulating post-duplication divergence, where differences in cis-regulatory elements and co-expressed gene clusters between source and target copies may be responsible. This study presents a large-scale test of asymmetric evolution in multi-gene duplications, offering new insight into how genome architecture shapes functional diversification of paralogs. Significance statementAfter a gene is duplicated, reduced selective constraints can lead the two copies to rapidly diverge, with one copy often evolving faster and occasionally gaining a new function. We quantify the influence of genetic context in choosing which copy of a duplicated gene has an elevated substitution rate. In a representative dataset of 193 mammalian genomes, we found strong evidence that gene copies pasted into new genomic locations tend to evolve faster than the corresponding copies in ancestral locations, suggesting an important role for the regulatory environment. The asymmetry in evolutionary rates of duplicated genes persists even for very large multigenic duplications, up to the scale of megabases, indicating that regulatory interactions frequently reach farther than previously thought. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2019745
- PAR ID:
- 10595434
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- bioRxiv
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract A signaling complex comprising members of the LORELEI (LRE)-LIKE GPI-anchored protein (LLG) and Catharanthus roseus RECEPTOR-LIKE KINASE 1-LIKE (CrRLK1L) families perceive RAPID ALKALINIZATION FACTOR (RALF) peptides and regulate growth, reproduction, immunity, and stress responses in Arabidopsis (Arabidopsis thaliana). Genes encoding these proteins are members of multigene families in most angiosperms and could generate thousands of signaling complex variants. However, the links between expansion of these gene families and the functional diversification of this critical signaling complex as well as the evolutionary factors underlying the maintenance of gene duplicates remain unknown. Here, we investigated LLG gene family evolution by sampling land plant genomes and explored the function and expression of angiosperm LLGs. We found that LLG diversity within major land plant lineages is primarily due to lineage-specific duplication events, and that these duplications occurred both early in the history of these lineages and more recently. Our complementation and expression analyses showed that expression divergence (i.e. regulatory subfunctionalization), rather than functional divergence, explains the retention of LLG paralogs. Interestingly, all but one monocot and all eudicot species examined had an LLG copy with preferential expression in male reproductive tissues, while the other duplicate copies showed highest levels of expression in female or vegetative tissues. The single LLG copy in Amborella trichopoda is expressed vastly higher in male compared to in female reproductive or vegetative tissues. We propose that expression divergence plays an important role in retention of LLG duplicates in angiosperms.more » « less
- 
            Abstract Duplicated genes provide the opportunity for evolutionary novelty and adaptive divergence. In many cases, having more gene copies increases gene expression, which might facilitate adaptation to stressful or novel environments. Conversely, overexpression or misexpression of duplicated genes can be detrimental and subject to negative selection. In this scenario, newly duplicate genes may evade purifying selection if they are epigenetically silenced, at least temporarily, leading them to persist in populations as copy number variations (CNVs). In animals and plants, younger gene duplicates tend to have higher levels of DNA methylation and lower levels of gene expression, suggesting epigenetic regulation could promote the retention of gene duplications via expression repression or silencing. Here, we test the hypothesis that DNA methylation variation coincides with young duplicate genes that are segregating as CNVs in six populations of the three‐spined stickleback that span a salinity gradient from 4 to 30 PSU. Using reduced‐representation bisulfite sequencing, we found DNA methylation and CNV differentiation outliers rarely overlapped. Whereas lineage‐specific genes and young duplicates were found to be highly methylated, just two gene CNVs showed a significant association between promoter methylation level and copy number, suggesting that DNA methylation might not interact with CNVs in our dataset. If most new duplications are regulated for dosage by epigenetic mechanisms, our results do not support a strong contribution from DNA methylation soon after duplication. Instead, our results are consistent with a preference to duplicate genes that are already highly methylated.more » « less
- 
            Abstract Gene duplication is a source of evolutionary novelty. DNA methylation may play a role in the evolution of duplicate genes (paralogs) through its association with gene expression. While this relationship has been examined to varying extents in a few individual species, the generalizability of these results at either a broad phylogenetic scale with species of differing duplication histories or across a population remains unknown. We applied a comparative epigenomic approach to 43 angiosperm species across the phylogeny and a population of 928 Arabidopsis (Arabidopsis thaliana) accessions, examining the association of DNA methylation with paralog evolution. Genic DNA methylation was differentially associated with duplication type, the age of duplication, sequence evolution, and gene expression. Whole-genome duplicates were typically enriched for CG-only gene body methylated or unmethylated genes, while single-gene duplications were typically enriched for non-CG methylated or unmethylated genes. Non-CG methylation, in particular, was a characteristic of more recent single-gene duplicates. Core angiosperm gene families were differentiated into those which preferentially retain paralogs and “duplication-resistant” families, which convergently reverted to singletons following duplication. Duplication-resistant families that still have paralogous copies were, uncharacteristically for core angiosperm genes, enriched for non-CG methylation. Non-CG methylated paralogs had higher rates of sequence evolution, higher frequency of presence–absence variation, and more limited expression. This suggests that silencing by non-CG methylation may be important to maintaining dosage following duplication and be a precursor to fractionation. Our results indicate that genic methylation marks differing evolutionary trajectories and fates between paralogous genes and have a role in maintaining dosage following duplication.more » « less
- 
            Abstract The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations1. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake2, although evidence of recent selection is lacking3,4. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in agricultural populations than in fishing, hunting and pastoral populations. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history.AMY1andAMY2Agenes each underwent multiple duplication/deletion events with mutation rates up to more than 10,000-fold the single-nucleotide polymorphism mutation rate, whereasAMY2Bgene duplications share a single origin. Using a pangenome-based approach, we infer structural haplotypes across thousands of humans identifying extensively duplicated haplotypes at higher frequency in modern agricultural populations. Leveraging 533 ancient human genomes, we find that duplication-containing haplotypes (with more gene copies than the ancestral haplotype) have rapidly increased in frequency over the past 12,000 years in West Eurasians, suggestive of positive selection. Together, our study highlights the potential effects of the agricultural revolution on human genomes and the importance of structural variation in human adaptation.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
