Abstract Allele-specific expression quantification from RNA-seq reads provides opportunities to study the control of gene regulatory networks bycis-acting andtrans-acting genetic variants. Many existing methods performed a single-gene and single-SNP association analysis to identify expression quantitative trait loci (eQTLs), and placed the eQTLs against known gene networks for functional interpretation. Instead, we view eQTL data as a capture of the effects of perturbation of gene regulatory system by a large number of genetic variants and reconstruct a gene network perturbed by eQTLs. We introduce a statistical framework called CiTruss for simultaneously learning a gene network andcis-acting andtrans-acting eQTLs that perturb this network, given population allele-specific expression and SNP data. CiTruss uses a multi-level conditional Gaussian graphical model to modeltrans-acting eQTLs perturbing the expression of both alleles in gene network at the top level andcis-acting eQTLs perturbing the expression of each allele at the bottom level. We derive a transformation of this model that allows efficient learning for large-scale human data. Our analysis of the GTEx and LG×SM advanced intercross line mouse data for multiple tissue types with CiTruss provides new insights into genetics of gene regulation. CiTruss revealed that gene networks consist of local subnetworks over proximally located genes and global subnetworks over genes scattered across genome, and that several aspects of gene regulation by eQTLs such as the impact of genetic diversity, pleiotropy, tissue-specific gene regulation, and local and long-range linkage disequilibrium among eQTLs can be explained through these local and global subnetworks.
more »
« less
G4mer: An RNA language model for transcriptome-wide identification of G-quadruplexes and disease variants from population-scale genetic data
ABSTRACT RNA G-quadruplexes (rG4s) are key regulatory elements in gene expression, yet the effects of genetic variants on rG4 formation remain underexplored. Here, we introduce G4mer, an RNA language model that predicts rG4 formation and evaluates the effects of genetic variants across the transcriptome. G4mer significantly improves accuracy over existing methods, highlighting sequence length and flanking motifs as important rG4 features. Applying G4mer to 5’ untranslated region (UTR) variations, we identify variants in breast cancer-associated genes that alter rG4 formation and validate their impact on structure and gene expression. These results demonstrate the potential of integrating computational models with experimental approaches to study rG4 function, especially in diseases where non-coding variants are often overlooked. To support broader applications, G4mer is available as both a web tool and a downloadable model.
more »
« less
- Award ID(s):
- 2400135
- PAR ID:
- 10616237
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- bioRxiv
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Phenotypic variation in organism-level traits has been studied inCaenorhabditis eleganswild strains, but the impacts of differences in gene expression and the underlying regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal-level traits, including drug and toxicant responses. We perform transcriptomic analyses on 207 genetically distinctC. eleganswild strains to study natural regulatory variation of gene expression. Using this massive dataset, we perform genome-wide association mappings to investigate the genetic basis underlying gene expression variation and reveal complex genetic architectures. We find a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further use mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of using gene expression variation to understand how phenotypic diversity is generated.more » « less
-
Lasky, Jesse R. (Ed.)Gene expression can be influenced by genetic variants that are closely linked to the expressed gene (cis eQTLs) and variants in other parts of the genome (trans eQTLs). We created a multiparental mapping population by sampling genotypes from a single natural population ofMimulus guttatusand scored gene expression in the leaves of 1,588 plants. We find that nearly every measured gene exhibits cis regulatory variation (91% have FDR < 0.05). cis eQTLs are usually allelic series with three or more functionally distinct alleles. The cis locus explains about two thirds of the standing genetic variance (on average) but varies among genes and tends to be greatest when there is high indel variation in the upstream regulatory region and high nucleotide diversity in the coding sequence. Despite mapping over 10,000 trans eQTL / affected gene pairs, most of the genetic variance generated by trans acting loci remains unexplained. This implies a large reservoir of trans acting genes with subtle or diffuse effects. Mapped trans eQTLs show lower allelic diversity but much higher genetic dominance than cis eQTLs. Several analyses also indicate that trans eQTLs make a substantial contribution to the genetic correlations in expression among different genes. They may thus be essential determinants of “gene expression modules,” which has important implications for the evolution of gene expression and how it is studied by geneticists.more » « less
-
Abstract Viruses persist in nature owing to their extreme genetic heterogeneity and large –population sizes, which enable them to evade host immune defenses, escape anti-viral drugs, and adapt to new hosts. The persistence of viruses is challenging to study because mutations affect multiple virus genes, interactions among genes in their impacts on virus growth are seldom known, and measures of viral fitness have yet to be standardized. To address these challenges, we employed a data-driven computational model of cell infection by a virus. The infection model accounted for the kinetics of viral gene expression, functional gene-gene interactions, genome replication, and allocation of host cellular resources to produce progeny of vesicular stomatitis virus (VSV), a prototype RNA virus. We used this model to computationally probe how interactions among genes carrying up to 11 deleterious mutations affect different measures of virus fitness: single-cycle growth yields and multi-cycle rates of infection spread. Individual mutations were implemented by perturbing biophysical parameters associated with individual gene functions of the wild-type model. Our analysis revealed synergistic epistasis among deleterious mutations in their effects on virus yield; so adverse effects of single deleterious mutations were amplified by interaction. For the same mutations, multi-cycle infection spread indicated weak or negligible epistasis, where single mutations act alone in their effects on infection spread. These results were robust to simulation under high and low host resource environments. Our work highlights how different types and magnitudes of epistasis can arise for genetically identical virus variants, depending on the fitness measure. More broadly, gene-gene interactions can differently affect how viruses grow and spread.more » « less
-
Hoffmann, Federico (Ed.)Abstract There is great interest in exploring epigenetic modifications as drivers of adaptive organismal responses to environmental change. Extending this hypothesis to populations, epigenetically driven plasticity could influence phenotypic changes across environments. The canonical model posits that epigenetic modifications alter gene regulation and subsequently impact phenotypes. We first discuss origins of epigenetic variation in nature, which may arise from genetic variation, spontaneous epimutations, epigenetic drift, or variation in epigenetic capacitors. We then review and synthesize literature addressing three facets of the aforementioned model: (i) causal effects of epigenetic modifications on phenotypic plasticity at the organismal level, (ii) divergence of epigenetic patterns in natural populations distributed across environmental gradients, and (iii) the relationship between environmentally induced epigenetic changes and gene expression at the molecular level. We focus on DNA methylation, the most extensively studied epigenetic modification. We find support for environmentally associated epigenetic structure in populations and selection on stable epigenetic variants, and that inhibition of epigenetic enzymes frequently bears causal effects on plasticity. However, there are pervasive confounding issues in the literature. Effects of chromatin-modifying enzymes on phenotype may be independent of epigenetic marks, alternatively resulting from functions and protein interactions extrinsic of epigenetics. Associations between environmentally induced changes in DNA methylation and expression are strong in plants and mammals but notably absent in invertebrates and nonmammalian vertebrates. Given these challenges, we describe emerging approaches to better investigate how epigenetic modifications affect gene regulation, phenotypic plasticity, and divergence among populations.more » « less
An official website of the United States government

