skip to main content

Title: Mixed-Layer Deep Modeling of Genotypes and Cross-Tissue Expression Uncovers Trans-Eqtls
Motivation: Modeling genetics of gene expression had been effective at highlighting cis-eQTLs, variants that control nearby transcripts. Yet, incorporation of long-range effects has been hampered by unfavora- ble statistical considerations. On the other end, expression alone has been modeled across tissues by decomposition into contributing factors, without any connection to genetics. Results: We develop MIxed-Layer Analysis of Genetics and Expression (MILAGE), a model that combines direct effects of cis-SNPs on nearby transcripts with trans-effects that control global factors of expression in a tissue-specific pattern. We develop judicious initialization of the model, followed by gradient descent learning. We present GPU-based implementation of the learner to enable computational feasibility in this otherwise intractably-large parameter space. We show the model to explain > 59% of test-set variation in GTEx data. The inferred genetically-regulated factors are consistent with expected tissue similarity.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
RECOMB Satellite on Genetics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Metabolic syndrome (MetSyn) is a cluster of dysregulated metabolic conditions that occur together to increase the risk for cardiometabolic disorders such as type 2 diabetes (T2D). One key condition associated with MetSyn, abdominal obesity, is measured by computing the ratio of waist-to-hip circumference adjusted for the body-mass index (WHRadjBMI). WHRadjBMI and T2D are complex traits with genetic and environmental components, which has enabled genome-wide association studies (GWAS) to identify hundreds of loci associated with both. Statistical genetics analyses of these GWAS have predicted that WHRadjBMI is a strong causal risk factor of T2D and that these traits share genetic architecture at many loci. To date, no variants have been described that are simultaneously associated with protection from T2D but with increased abdominal obesity. Here, we used colocalization analysis to identify genetic variants with a shared association for T2D and abdominal obesity. This analysis revealed the presence of five loci associated with discordant effects on T2D and abdominal obesity. The alleles of the lead genetic variants in these loci that were protective against T2D were also associated with increased abdominal obesity. We further used publicly available expression, epigenomic, and genetic regulatory data to predict the effector genes (eGenes) and functional tissues at the 2p21, 5q21.1, and 19q13.11 loci. We also computed the correlation between the subcutaneous adipose tissue (SAT) expression of predicted effector genes (eGenes) with metabolic phenotypes and adipogenesis. We proposed a model to resolve the discordant effects at the 5q21.1 locus. We find that eGenes gypsy retrotransposon integrase 1 ( GIN1 ), diphosphoinositol pentakisphosphate kinase 2 (PPIP5K2), and peptidylglycine alpha-amidating monooxygenase ( PAM ) represent the likely causal eGenes at the 5q21.1 locus. Taken together, these results are the first to describe a potential mechanism through which a genetic variant can confer increased abdominal obesity but protection from T2D risk. Understanding precisely how and which genetic variants confer increased risk for MetSyn will develop the basic science needed to design novel therapeutics for metabolic syndrome. 
    more » « less
  2. Transcription is controlled by interactions of cis -acting DNA elements with diffusible trans -acting factors. Changes in cis or trans factors can drive expression divergence within and between species, and their relative prevalence can reveal the evolutionary history and pressures that drive expression variation. Previous work delineating the mode of expression divergence in animals has largely used whole-body expression measurements in one condition. Because cis -acting elements often drive expression in a subset of cell types or conditions, these measurements may not capture the complete contribution of cis -acting changes. Here, we quantify the mode of expression divergence in the Drosophila fat body, the primary immune organ, in several conditions, using two geographically distinct lines of D. melanogaster and their F1 hybrids. We measured expression in the absence of infection and in infections with Gram-negative S. marcescens or Gram-positive E. faecalis bacteria, which trigger the two primary signaling pathways in the Drosophila innate immune response. The mode of expression divergence strongly depends on the condition, with trans -acting effects dominating in response to Gram-negative infection and cis -acting effects dominating in Gram-positive and preinfection conditions. Expression divergence in several receptor proteins may underlie the infection-specific trans effects. Before infection, when the fat body has a metabolic role, there are many compensatory effects, changes in cis and trans that counteract each other to maintain expression levels. This work shows that within a single tissue, the mode of expression divergence varies between conditions and suggests that these differences reflect the diverse evolutionary histories of host–pathogen interactions. 
    more » « less
  3. Summary

    The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortunately, these approaches do not correspond to the simplest model of eQTL action, and thus yield estimates of eQTL association that can be uninterpretable and inaccurate. In this article, we propose a new, log-of-linear model for eQTL action, termed ACME, that captures allelic contributions to cis-acting eQTLs in an additive fashion, yielding effect size estimates that correspond to a biologically coherent model of cis-eQTLs. We describe a non-linear least-squares algorithm to fit the model by maximum likelihood, and obtain corresponding p-values. We perform careful investigation of the model using a combination of simulated data and data from the Genotype Tissue Expression (GTEx) project. Our results reveal little evidence for dominance effects, a parsimonious result that accords with a simple biological model for allele-specific expression and supports use of the ACME model. We show that Type-I error is well-controlled under our approach in a realistic setting, so that rank-based normalizations are unnecessary. Furthermore, we show that such normalizations can be detrimental to power and estimation accuracy under the proposed model. We then show, through effect size analyses of whole-genome cis-eQTLs in the GTEx data, that using standard normalizations instead of ACME noticeably affects the ranking and sign of estimates.

    more » « less
  4. Robinson, Peter (Ed.)
    Abstract Motivation

    Identifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today.


    Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors.

    Availabilityand implementation

    Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at All R scripts used in this study are also available at this site.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  5. Storz, Gisela (Ed.)
    ABSTRACT Mutations in regulatory mechanisms that control gene expression contribute to phenotypic diversity and thus facilitate the adaptation of microbes and other organisms to new niches. Comparative genomics can be used to infer rewiring of regulatory architecture based on large effect mutations like loss or acquisition of transcription factors but may be insufficient to identify small changes in noncoding, intergenic DNA sequence of regulatory elements that drive phenotypic divergence. In human-derived Vibrio cholerae , the response to distinct chemical cues triggers production of multiple transcription factors that can regulate the type VI secretion system (T6), a broadly distributed weapon for interbacterial competition. However, to date, the signaling network remains poorly understood because no regulatory element has been identified for the major T6 locus. Here we identify a conserved cis -acting single nucleotide polymorphism (SNP) controlling T6 transcription and activity. Sequence alignment of the T6 regulatory region from diverse V. cholerae strains revealed conservation of the SNP that we rewired to interconvert V. cholerae T6 activity between chitin-inducible and constitutive states. This study supports a model of pathogen evolution through a noncoding cis -regulatory mutation and preexisting, active transcription factors that confers a different fitness advantage to tightly regulated strains inside a human host and unfettered strains adapted to environmental niches. IMPORTANCE Organisms sense external cues with regulatory circuits that trigger the production of transcription factors, which bind specific DNA sequences at promoters (“ cis ” regulatory elements) to activate target genes. Mutations of transcription factors or their regulatory elements create phenotypic diversity, allowing exploitation of new niches. Waterborne pathogen Vibrio cholerae encodes the type VI secretion system “nanoweapon” to kill competitor cells when activated. Despite identification of several transcription factors, no regulatory element has been identified in the promoter of the major type VI locus, to date. Combining phenotypic, genetic, and genomic analysis of diverse V. cholerae strains, we discovered a single nucleotide polymorphism in the type VI promoter that switches its killing activity between a constitutive state beneficial outside hosts and an inducible state for constraint in a host. Our results support a role for noncoding DNA in adaptation of this pathogen. 
    more » « less