Understanding the relationship between genotypes and phenotypes is crucial for advancing personalized medicine. Expression quantitative trait loci (eQTL) mapping plays a significant role by correlating genetic variants to gene expression levels. Despite the progress made by large-scale projects, eQTL mapping still faces challenges in statistical power and privacy concerns. Multi-site studies can increase sample sizes but are hindered by privacy issues. We present privateQTL, a novel framework leveraging secure multi-party computation for secure and federated eQTL mapping. When tested in a real-world scenario with data from different studies, privateQTL outperformed meta-analysis by accurately correcting for covariates and batch effect and retaining higher accuracy and precision for both eGene-eVariant mapping and effect size estimation. In addition, privateQTL is modular and scalable, making it adaptable for other molecular phenotypes and large-scale studies. Our results indicate that privateQTL is a practical solution for privacy-preserving collaborative eQTL mapping.
more »
« less
Selection-adjusted inference: an application to confidence intervals for cis-eQTL effect sizes
Summary The goal of expression quantitative trait loci (eQTL) studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20 000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relatively cheap once a specimen is at hand, obtaining human tissue remains a costly endeavor: eQTL studies continue to be based on relatively small sample sizes, with this limitation particularly serious for tissues as brain, liver, etc.—often the organs of most immediate medical relevance. Given the high-dimensional nature of these datasets and the large number of hypotheses tested, the scientific community has adopted early on multiplicity adjustment procedures. These testing procedures primarily control the false discoveries rate for the identification of genetic variants with influence on the expression levels. In contrast, a problem that has not received much attention to date is that of providing estimates of the effect sizes associated with these variants, in a way that accounts for the considerable amount of selection. Yet, given the difficulty of procuring additional samples, this challenge is of practical importance. We illustrate in this work how the recently developed conditional inference approach can be deployed to obtain confidence intervals for the eQTL effect sizes with reliable coverage. The procedure we propose is based on a randomized hierarchical strategy with a 2-fold contribution: (1) it reflects the selection steps typically adopted in state of the art investigations and (2) it introduces the use of randomness instead of data-splitting to maximize the use of available data. Analysis of the GTEx Liver dataset (v6) suggests that naively obtained confidence intervals would likely not cover the true values of effect sizes and that the number of local genetic polymorphisms influencing the expression level of genes might be underestimated.
more »
« less
- Award ID(s):
- 1712800
- PAR ID:
- 10107866
- Date Published:
- Journal Name:
- Biostatistics
- ISSN:
- 1465-4644
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Robinson, Peter (Ed.)Abstract MotivationIdentifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today. ResultsHere, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors. Availabilityand implementationOur method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract Background Large-scale genome-wide association studies have successfully identified many genetic variants significantly associated with Alzheimer’s disease (AD), such as rs429358, rs11038106, rs723804, rs13591776, and more. The next key step is to understand the function of these SNPs and the downstream biology through which they exert the effect on the development of AD. However, this remains a challenging task due to the tissue-specific nature of transcriptomic and proteomic data and the limited availability of brain tissue.In this paper, instead of using coupled transcriptomic data, we performed an integrative analysis of existing GWAS findings and expression quantitative trait loci (eQTL) results from AD-related brain regions to estimate the transcriptomic alterations in AD brain. Results We used summary-based mendelian randomization method along with heterogeneity in dependent instruments method and were able to identify 32 genes with potential altered levels in temporal cortex region. Among these, 10 of them were further validated using real gene expression data collected from temporal cortex region, and 19 SNPs from NECTIN and TOMM40 genes were found associated with multiple temporal cortex imaging phenotype. Conclusion Significant pathways from enriched gene networks included neutrophil degranulation, Cell surface interactions at the vascular wall, and Regulation of TP53 activity which are still relatively under explored in Alzheimer’s Disease while also encouraging a necessity to bind further trans-eQTL effects into this integrative analysis.more » « less
-
Abstract BackgroundAlthough genome-wide association studies (GWAS) have successfully located various genetic variants susceptible to Alzheimer’s Disease (AD), it is still unclear how specific variants interact with genes and tissues to elucidate pathologies associated with AD. Summary-data-based Mendelian Randomization (SMR) addresses this problem through an instrumental variable approach that integrates data from independent GWAS and expression quantitative trait locus (eQTL) studies in order to infer a causal effect of gene expression on a trait. ResultsOur study employed the SMR approach to integrate a set of meta-analytic cis-eQTL information from the Genotype-Tissue Expression (GTEx), CommonMind Consortium (CMC), and Religious Orders Study and Rush Memory and Aging Project (ROS/MAP) consortiums with three sets of meta-analysis AD GWAS results. ConclusionsOur analysis identified twelve total gene probes (associated with twelve distinct genes) with a significant association with AD. Four of these genes survived a test of pleiotropy from linkage (the HEIDI test).Three of these genes – RP11-385F7.1, PRSS36, and AC012146.7 – have not yet been reported differentially expressed in the brain in the context of AD, and thus are the novel findings warranting further investigation.more » « less
-
Trans-Acting Genotypes Associated with mRNA Expression Affect Metabolic and Thermal Tolerance TraitsBetancourt, Andrea (Ed.)Abstract Evolutionary processes driving physiological trait variation depend on the underlying genomic mechanisms. Evolution of these mechanisms depends on the genetic complexity (involving many genes) and how gene expression impacting the traits is converted to phenotype. Yet, genomic mechanisms that impact physiological traits are diverse and context dependent (e.g., vary by environment and tissues), making them difficult to discern. We examine the relationships between genotype, mRNA expression, and physiological traits to discern the genetic complexity and whether the gene expression affecting the physiological traits is primarily cis- or trans-acting. We use low-coverage whole genome sequencing and heart- or brain-specific mRNA expression to identify polymorphisms directly associated with physiological traits and expressed quantitative trait loci (eQTL) indirectly associated with variation in six temperature specific physiological traits (standard metabolic rate, thermal tolerance, and four substrate specific cardiac metabolic rates). Focusing on a select set of mRNAs belonging to co-expression modules that explain up to 82% of temperature specific traits, we identified hundreds of significant eQTL for mRNA whose expression affects physiological traits. Surprisingly, most eQTL (97.4% for heart and 96.7% for brain) were trans-acting. This could be due to higher effect size of trans- versus cis-acting eQTL for mRNAs that are central to co-expression modules. That is, we may have enhanced the identification of trans-acting factors by looking for single nucleotide polymorphisms associated with mRNAs in co-expression modules that broadly influence gene expression patterns. Overall, these data indicate that the genomic mechanism driving physiological variation across environments is driven by trans-acting heart- or brain-specific mRNA expression.more » « less
An official website of the United States government

