skip to main content


Title: Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes
Abstract Motivation

There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues.

Results

We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses.

Availability and Implementation

Source code is at https://github.com/datduong/RECOV.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
PAR ID:
10413320
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
33
Issue:
14
ISSN:
1367-4803
Page Range / eLocation ID:
p. i67-i74
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Neuropsychiatric disorders afflict a large portion of the global population and constitute a significant source of disability worldwide. Although Genome-wide Association Studies (GWAS) have identified many disorder-associated variants, the underlying regulatory mechanisms linking them to disorders remain elusive, especially those involving distant genomic elements. Expression quantitative trait loci (eQTLs) constitute a powerful means of providing this missing link. However, most eQTL studies in human brains have focused exclusively on cis-eQTLs, which link variants to nearby genes (i.e., those within 1 Mb of a variant). A complete understanding of disease etiology requires a clearer understanding of trans-regulatory mechanisms, which, in turn, entails a detailed analysis of the relationships between variants and expression changes in distant genes.

    Methods

    By leveraging large datasets from the PsychENCODE consortium, we conducted a genome-wide survey of trans-eQTLs in the human dorsolateral prefrontal cortex. We also performed colocalization and mediation analyses to identify mediators in trans-regulation and use trans-eQTLs to link GWAS loci to schizophrenia risk genes.

    Results

    We identified ~80,000 candidate trans-eQTLs (at FDR<0.25) that influence the expression of ~10K target genes (i.e., “trans-eGenes”). We found that many variants associated with these candidate trans-eQTLs overlap with known cis-eQTLs. Moreover, for >60% of these variants (by colocalization), the cis-eQTL’s target gene acts as a mediator for the trans-eQTL SNP's effect on the trans-eGene, highlighting examples of cis-mediation as essential for trans-regulation. Furthermore, many of these colocalized variants fall into a discernable pattern wherein cis-eQTL’s target is a transcription factor or RNA-binding protein, which, in turn, targets the gene associated with the candidate trans-eQTL. Finally, we show that trans-regulatory mechanisms provide valuable insights into psychiatric disorders: beyond what had been possible using only cis-eQTLs, we link an additional 23 GWAS loci and 90 risk genes (using colocalization between candidate trans-eQTLs and schizophrenia GWAS loci).

    Conclusions

    We demonstrate that the transcriptional architecture of the human brain is orchestrated by both cis- and trans-regulatory variants and found that trans-eQTLs provide insights into brain-disease biology.

     
    more » « less
  2. Robinson, Peter (Ed.)
    Abstract Motivation

    Identifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today.

    Results

    Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors.

    Availabilityand implementation

    Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Background

    Although genome-wide association studies (GWAS) have successfully located various genetic variants susceptible to Alzheimer’s Disease (AD), it is still unclear how specific variants interact with genes and tissues to elucidate pathologies associated with AD. Summary-data-based Mendelian Randomization (SMR) addresses this problem through an instrumental variable approach that integrates data from independent GWAS and expression quantitative trait locus (eQTL) studies in order to infer a causal effect of gene expression on a trait.

    Results

    Our study employed the SMR approach to integrate a set of meta-analytic cis-eQTL information from the Genotype-Tissue Expression (GTEx), CommonMind Consortium (CMC), and Religious Orders Study and Rush Memory and Aging Project (ROS/MAP) consortiums with three sets of meta-analysis AD GWAS results.

    Conclusions

    Our analysis identified twelve total gene probes (associated with twelve distinct genes) with a significant association with AD. Four of these genes survived a test of pleiotropy from linkage (the HEIDI test).Three of these genes – RP11-385F7.1, PRSS36, and AC012146.7 – have not yet been reported differentially expressed in the brain in the context of AD, and thus are the novel findings warranting further investigation.

     
    more » « less
  4. Abstract Motivation

    Many variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time.

    Results

    Here, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci.

    Availability and implementation

    Our source code is available at https://github.com/lgai/CONFIT.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Background

    Uncovering the functional relevance underlying verbal declarative memory (VDM) genome-wide association study (GWAS) results may facilitate the development of interventions to reduce age-related memory decline and dementia.

    Methods

    We performed multi-omics and pathway enrichment analyses of paragraph (PAR-dr) and word list (WL-dr) delayed recall GWAS from 29,076 older non-demented individuals of European descent. We assessed the relationship between single-variant associations and expression quantitative trait loci (eQTLs) in 44 tissues and methylation quantitative trait loci (meQTLs) in the hippocampus. We determined the relationship between gene associations and transcript levels in 53 tissues, annotation as immune genes, and regulation by transcription factors (TFs) and microRNAs. To identify significant pathways, gene set enrichment was tested in each cohort and meta-analyzed across cohorts. Analyses of differential expression in brain tissues were conducted for pathway component genes.

    Results

    The single-variant associations of VDM showed significant linkage disequilibrium (LD) with eQTLs across all tissues and meQTLs within the hippocampus. Stronger WL-dr gene associations correlated with reduced expression in four brain tissues, including the hippocampus. More robust PAR-dr and/or WL-dr gene associations were intricately linked with immunity and were influenced by 31 TFs and 2 microRNAs. Six pathways, including type I diabetes, exhibited significant associations with both PAR-dr and WL-dr. These pathways included fifteen MHC genes intricately linked to VDM performance, showing diverse expression patterns based on cognitive status in brain tissues.

    Conclusions

    VDM genetic associations influence expression regulation via eQTLs and meQTLs. The involvement of TFs, microRNAs, MHC genes, and immune-related pathways contributes to VDM performance in older individuals.

     
    more » « less