skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predictive models of genetic redundancy in Arabidopsis thaliana
Abstract Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including post-translational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on hold-out, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.  more » « less
Award ID(s):
1655630 1655386
PAR ID:
10226049
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Editor(s):
de Meaux, Juliette
Date Published:
Journal Name:
Molecular Biology and Evolution
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Cryptic genetic variants exert minimal or no phenotypic effects alone but have long been hypothesized to form a vast, hidden reservoir of genetic diversity that drives trait evolvability through epistatic interactions. This classical theory has been reinvigorated by pan-genome sequencing, which has revealed pervasive variation within gene families and regulatory networks, including extensive cis-regulatory changes, gene duplication, and divergence between paralogs. Nevertheless, empirical testing of cryptic variation’s capacity to fuel phenotypic diversification has been hindered by intractable genetics, limited allelic diversity, and inadequate phenotypic resolution. Here, guided by natural and engineered cis-regulatory cryptic variants in a recently evolved paralogous gene pair, we identified an additional pair of redundant trans regulators, establishing a regulatory network that controls tomato inflorescence architecture. By combining coding mutations with a cis-regulatory allelic series in populations segregating for all four network genes, we systematically constructed a collection of 216 genotypes spanning the full spectrum of inflorescence complexity and quantified branching in over 27,000 inflorescences. Analysis of the resulting high-resolution genotype-phenotype map revealed a layer of dose-dependent interactions within paralog pairs that enhances branching, culminating in strong, synergistic effects. However, we also uncovered an unexpected layer of antagonism between paralog pairs, where accumulating mutations in one pair progressively diminished the effects of mutations in the other. Our results demonstrate how gene regulatory network architecture and complex dosage effects from paralog diversification converge to shape phenotypic space under a hierarchical model of epistatic interactions. Given the prevalence of paralog evolution in genomes, we propose that paralogous cryptic variation within regulatory networks elicits hierarchies of epistatic interactions, catalyzing bursts of phenotypic change. Keyword:cryptic mutations, paralogs, redundancy, cis-regulatory, tomato, inflorescence, gene regulatory network, modeling, epistasis 
    more » « less
  2. Synopsis Gene duplicates, or paralogs, serve as a major source of new genetic material and comprise seeds for evolutionary innovation. While originally thought to be quickly lost or nonfunctionalized following duplication, now a vast number of paralogs are known to be retained in a functional state. Daughter paralogs can provide robustness through redundancy, specialize via sub-functionalization, or neo-functionalize to play new roles. Indeed, the duplication and divergence of developmental genes have played a monumental role in the evolution of animal forms (e.g., Hox genes). Still, despite their prevalence and evolutionary importance, the precise detection of gene duplicates in newly sequenced genomes remains technically challenging and often overlooked. This presents an especially pertinent problem for evolutionary developmental biology, where hypothesis testing requires accurate detection of changes in gene expression and function, often in nontraditional model species. Frequently, these analyses rely on molecular reagents designed within coding sequences that may be highly similar in recently duplicated paralogs, leading to cross-reactivity and spurious results. Thus, care is needed to avoid erroneously assigning diverged functions of paralogs to a single gene, and potentially misinterpreting evolutionary history. This perspective aims to overview the prevalence and importance of paralogs and to shed light on the difficulty of their detection and analysis while offering potential solutions. 
    more » « less
  3. Summary Gene duplication is a powerful source of biological innovation giving rise to paralogous genes that undergo diverse fates. Redundancy between paralogous genes is an intriguing outcome of duplicate gene evolution, and its maintenance over evolutionary time has long been considered a paradox. Redundancy can also be dubbed ‘a geneticist's nightmare’: It hinders the predictability of genome editing outcomes and limits our ability to link genotypes to phenotypes. Genetic studies in yeast and plants have suggested that the ability of ancient redundant duplicates to compensate for dosage perturbations resulting from a loss of function depends on the reprogramming of gene expression, a phenomenon known as active compensation. Starting from considerations on the stoichiometric constraints that drive the evolutionary stability of redundancy, this review aims to provide insights into the mechanisms of active compensation between duplicates that could be targeted for breaking paralog dependencies – the next frontier in plant functional studies. 
    more » « less
  4. Abstract MotivationGene deletion is traditionally thought of as a nonadaptive process that removes functional redundancy from genomes, such that it generally receives less attention than duplication in evolutionary turnover studies. Yet, mounting evidence suggests that deletion may promote adaptation via the “less-is-more” evolutionary hypothesis, as it often targets genes harboring unique sequences, expression profiles, and molecular functions. Hence, predicting the relative prevalence of redundant and unique functions among genes targeted by deletion, as well as the parameters underlying their evolution, can shed light on the role of gene deletion in adaptation. ResultsHere, we present CLOUDe, a suite of machine learning methods for predicting evolutionary targets of gene deletion events from expression data. Specifically, CLOUDe models expression evolution as an Ornstein–Uhlenbeck process, and uses multi-layer neural network, extreme gradient boosting, random forest, and support vector machine architectures to predict whether deleted genes are “redundant” or “unique”, as well as several parameters underlying their evolution. We show that CLOUDe boasts high power and accuracy in differentiating between classes, and high accuracy and precision in estimating evolutionary parameters, with optimal performance achieved by its neural network architecture. Application of CLOUDe to empirical data from Drosophila suggests that deletion primarily targets genes with unique functions, with further analysis showing these functions to be enriched for protein deubiquitination. Thus, CLOUDe represents a key advance in learning about the role of gene deletion in functional evolution and adaptation. Availability and implementationCLOUDe is freely available on GitHub (https://github.com/anddssan/CLOUDe). 
    more » « less
  5. Zhang, Jianzhi (Ed.)
    Abstract The amplification and diversification of genes into large multi-gene families often mark key evolutionary innovations, but this process often creates genetic redundancy that hinders functional investigations. When the model budding yeast Saccharomyces cerevisiae transitions to anaerobic growth conditions, the cell massively induces the expression of seven serine/threonine-rich anaerobically-induced cell wall mannoproteins (anCWMPs): TIP1, TIR1, TIR2, TIR3, TIR4, DAN1, and DAN4. Here, we show that these genes likely derive evolutionarily from a single ancestral anCWMP locus, which was duplicated and translocated to new genomic contexts several times both prior to and following the budding yeast whole genome duplication (WGD) event. Based on synteny and their phylogeny, we separate the anCWMPs into four gene subfamilies. To resolve prior inconclusive genetic investigations of these genes, we constructed a set of combinatorial deletion mutants to determine their contributions toward anaerobic growth in S. cerevisiae. We found that two genes, TIR1 and TIR3, were together necessary and sufficient for the anCWMP contribution to anaerobic growth. Overexpressing either gene alone was insufficient for anaerobic growth, implying that they encode non-overlapping functional roles in the cell during anaerobic growth. We infer from the phylogeny of the anCWMP genes that these two important genes derive from an ancient duplication that predates the WGD event, whereas the TIR1 subfamily experienced gene family amplification after the WGD event. Taken together, the genetic and molecular evidence suggests that one key anCWMP gene duplication event, several auxiliary gene duplication events, and functional divergence underpin the evolution of anaerobic growth in budding yeasts. 
    more » « less