skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: G4mer: An RNA language model for transcriptome-wide identification of G-quadruplexes and disease variants from population-scale genetic data
ABSTRACT RNA G-quadruplexes (rG4s) are key regulatory elements in gene expression, yet the effects of genetic variants on rG4 formation remain underexplored. Here, we introduce G4mer, an RNA language model that predicts rG4 formation and evaluates the effects of genetic variants across the transcriptome. G4mer significantly improves accuracy over existing methods, highlighting sequence length and flanking motifs as important rG4 features. Applying G4mer to 5’ untranslated region (UTR) variations, we identify variants in breast cancer-associated genes that alter rG4 formation and validate their impact on structure and gene expression. These results demonstrate the potential of integrating computational models with experimental approaches to study rG4 function, especially in diseases where non-coding variants are often overlooked. To support broader applications, G4mer is available as both a web tool and a downloadable model.  more » « less
Award ID(s):
2400135
PAR ID:
10616237
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Allele-specific expression quantification from RNA-seq reads provides opportunities to study the control of gene regulatory networks bycis-acting andtrans-acting genetic variants. Many existing methods performed a single-gene and single-SNP association analysis to identify expression quantitative trait loci (eQTLs), and placed the eQTLs against known gene networks for functional interpretation. Instead, we view eQTL data as a capture of the effects of perturbation of gene regulatory system by a large number of genetic variants and reconstruct a gene network perturbed by eQTLs. We introduce a statistical framework called CiTruss for simultaneously learning a gene network andcis-acting andtrans-acting eQTLs that perturb this network, given population allele-specific expression and SNP data. CiTruss uses a multi-level conditional Gaussian graphical model to modeltrans-acting eQTLs perturbing the expression of both alleles in gene network at the top level andcis-acting eQTLs perturbing the expression of each allele at the bottom level. We derive a transformation of this model that allows efficient learning for large-scale human data. Our analysis of the GTEx and LG×SM advanced intercross line mouse data for multiple tissue types with CiTruss provides new insights into genetics of gene regulation. CiTruss revealed that gene networks consist of local subnetworks over proximally located genes and global subnetworks over genes scattered across genome, and that several aspects of gene regulation by eQTLs such as the impact of genetic diversity, pleiotropy, tissue-specific gene regulation, and local and long-range linkage disequilibrium among eQTLs can be explained through these local and global subnetworks. 
    more » « less
  2. Abstract BackgroundGenetic and epigenetic perturbation of cis-regulatory sequences can shift patterns of gene expression and result in novel phenotypes. Phased genome assemblies now enable the local dissection of linkages between cis-regulatory sequences, including their epigenetic state, and allele-specific gene expression to further characterize gene regulation and resulting phenotypes in heterozygous genomes. ResultsWe assembled a locally phased genome for a mandarin hybrid named ‘Fairchild’ to explore the molecular signatures of allele-specific gene expression. With local genome phasing, genes with allele-specific expression were paired with haplotype-specific chromatin states, including levels of chromatin accessibility, histone modifications, and DNA methylation. We found that 30% of variation in allele-specific expression could be attributed to haplotype associated factors, with allelic levels of chromatin accessibility and three histone modifications in gene bodies having the most influence. Structural variants in promoter regions were also associated with allele-specific expression, including specific enrichments of hAT and MULE-MuDR DNA transposon sequences. Integration of haplotype-resolved genetic and epigenetic landscapes with high-throughput phenotypic analysis of fruit traits in a panel of 154 accessions with mandarin and pummelo ancestry revealed that trait-associated variants were enriched in regions of open chromatin. Mining of trait-associated variants uncovered a Gypsy retrotransposon insertion in a gene that regulates potassium transport and may contribute to the reduction in fruit size that is observed in mandarins. Conclusions​​Using a locally phased assembly of a heterozygous cultivar of citrus, we dissected the interplay between genetic variants and molecular phenotypes to reveal cis-regulatory sequences with potential functional effects on phenotypes relevant for genetic improvement. 
    more » « less
  3. Barbash, Daniel (Ed.)
    Abstract To understand the relative importance of cis and trans effects on regulation, we crossed multi-parent recombinant-inbred-lines (RILs) to a common tester and measured allele specific gene expression in the offspring. Testing difference of allelic imbalance between two RIL x Tester crosses is a test of cis or trans depending on the RIL alleles compared. The study design also enables to separate two sources of trans variation, genetic and environmental, detected via interactions with cis effects. We demonstrate the effectiveness of this approach in a long-read RNA-seq experiment in female abdominal tissue at two time points in Drosophila melanogaster. Among the 40% of all loci that show evidence of genetic variation in cis, trans effects due to environment are detectable in 31% of loci and trans effects due to genetic background in 19%, with little overlap in sources of trans variation. The genes identified in this study are associated with genes previously reported to exhibit genetic variation in gene expression. Eleven genes in a QTL for thermotolerance, previously shown to differ in expression based on temperature, have evidence for regulation of gene expression regardless of the environment, including the cuticular protein Cpr67B, suggesting a functional role for standing variation in gene expression. This study provides a blueprint for identifying regulatory variation in gene expression, as the tester design maximizes cis variation and enables the efficient assessment of all pairs of RIL alleles relative to the tester, a much smaller study compared to the pairwise direct assessment. 
    more » « less
  4. Abstract Phenotypic variation in organism-level traits has been studied inCaenorhabditis eleganswild strains, but the impacts of differences in gene expression and the underlying regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal-level traits, including drug and toxicant responses. We perform transcriptomic analyses on 207 genetically distinctC. eleganswild strains to study natural regulatory variation of gene expression. Using this massive dataset, we perform genome-wide association mappings to investigate the genetic basis underlying gene expression variation and reveal complex genetic architectures. We find a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further use mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of using gene expression variation to understand how phenotypic diversity is generated. 
    more » « less
  5. Lasky, Jesse R. (Ed.)
    Gene expression can be influenced by genetic variants that are closely linked to the expressed gene (cis eQTLs) and variants in other parts of the genome (trans eQTLs). We created a multiparental mapping population by sampling genotypes from a single natural population ofMimulus guttatusand scored gene expression in the leaves of 1,588 plants. We find that nearly every measured gene exhibits cis regulatory variation (91% have FDR < 0.05). cis eQTLs are usually allelic series with three or more functionally distinct alleles. The cis locus explains about two thirds of the standing genetic variance (on average) but varies among genes and tends to be greatest when there is high indel variation in the upstream regulatory region and high nucleotide diversity in the coding sequence. Despite mapping over 10,000 trans eQTL / affected gene pairs, most of the genetic variance generated by trans acting loci remains unexplained. This implies a large reservoir of trans acting genes with subtle or diffuse effects. Mapped trans eQTLs show lower allelic diversity but much higher genetic dominance than cis eQTLs. Several analyses also indicate that trans eQTLs make a substantial contribution to the genetic correlations in expression among different genes. They may thus be essential determinants of “gene expression modules,” which has important implications for the evolution of gene expression and how it is studied by geneticists. 
    more » « less