skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on February 12, 2026

Title: Secure and federated quantitative trait loci mapping with privateQTL
Understanding the relationship between genotypes and phenotypes is crucial for advancing personalized medicine. Expression quantitative trait loci (eQTL) mapping plays a significant role by correlating genetic variants to gene expression levels. Despite the progress made by large-scale projects, eQTL mapping still faces challenges in statistical power and privacy concerns. Multi-site studies can increase sample sizes but are hindered by privacy issues. We present privateQTL, a novel framework leveraging secure multi-party computation for secure and federated eQTL mapping. When tested in a real-world scenario with data from different studies, privateQTL outperformed meta-analysis by accurately correcting for covariates and batch effect and retaining higher accuracy and precision for both eGene-eVariant mapping and effect size estimation. In addition, privateQTL is modular and scalable, making it adaptable for other molecular phenotypes and large-scale studies. Our results indicate that privateQTL is a practical solution for privacy-preserving collaborative eQTL mapping.  more » « less
Award ID(s):
2247352
PAR ID:
10635929
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Cell Press
Date Published:
Journal Name:
Cell Genomics
Volume:
5
Issue:
2
ISSN:
2666-979X
Page Range / eLocation ID:
100769
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Robinson, Peter (Ed.)
    Abstract MotivationIdentifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today. ResultsHere, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors. Availabilityand implementationOur method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract Genome‐wide expression quantitative trait loci (eQTLs) mapping explores the relationship between gene expression and DNA variants, such as single‐nucleotide polymorphism (SNPs), to understand genetic basis of human diseases. Due to the large number of genes and SNPs that need to be assessed, current methods for eQTL mapping often suffer from low detection power, especially for identifyingtrans‐eQTLs. In this paper, we propose the idea of performing SNP ranking based on the higher criticism statistic, a summary statistic developed in large‐scale signal detection. We illustrate how the HC‐based SNP ranking can effectively prioritize eQTL signals over noise, greatly reduce the burden of joint modeling, and improve the power for eQTL mapping. Numerical results in simulation studies demonstrate the superior performance of our method compared to existing methods. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs. 
    more » « less
  3. Summary The goal of expression quantitative trait loci (eQTL) studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20 000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relatively cheap once a specimen is at hand, obtaining human tissue remains a costly endeavor: eQTL studies continue to be based on relatively small sample sizes, with this limitation particularly serious for tissues as brain, liver, etc.—often the organs of most immediate medical relevance. Given the high-dimensional nature of these datasets and the large number of hypotheses tested, the scientific community has adopted early on multiplicity adjustment procedures. These testing procedures primarily control the false discoveries rate for the identification of genetic variants with influence on the expression levels. In contrast, a problem that has not received much attention to date is that of providing estimates of the effect sizes associated with these variants, in a way that accounts for the considerable amount of selection. Yet, given the difficulty of procuring additional samples, this challenge is of practical importance. We illustrate in this work how the recently developed conditional inference approach can be deployed to obtain confidence intervals for the eQTL effect sizes with reliable coverage. The procedure we propose is based on a randomized hierarchical strategy with a 2-fold contribution: (1) it reflects the selection steps typically adopted in state of the art investigations and (2) it introduces the use of randomness instead of data-splitting to maximize the use of available data. Analysis of the GTEx Liver dataset (v6) suggests that naively obtained confidence intervals would likely not cover the true values of effect sizes and that the number of local genetic polymorphisms influencing the expression level of genes might be underestimated. 
    more » « less
  4. Background: Coronary artery disease (CAD) is the leading cause of death worldwide. Recent meta-analyses of genome-wide association studies have identified over 175 loci associated with CAD. The majority of these loci are in noncoding regions and are predicted to regulate gene expression. Given that vascular smooth muscle cells (SMCs) play critical roles in the development and progression of CAD, we aimed to identify the subset of the CAD loci associated with the regulation of transcription in distinct SMC phenotypes. Methods: We measured gene expression in SMCs isolated from the ascending aortas of 151 heart transplant donors of various genetic ancestries in quiescent or proliferative conditions and calculated the association of their expression and splicing with ~6.3 million imputed single-nucleotide polymorphism markers across the genome. Results: We identified 4910 expression and 4412 splicing quantitative trait loci (sQTLs) representing regions of the genome associated with transcript abundance and splicing. A total of 3660 expression quantitative trait loci (eQTLs) had not been observed in the publicly available Genotype-Tissue Expression dataset. Further, 29 and 880 eQTLs were SMC-specific and sex-biased, respectively. We made these results available for public query on a user-friendly website. To identify the effector transcript(s) regulated by CAD loci, we used 4 distinct colocalization approaches. We identified 84 eQTL and 164 sQTL that colocalized with CAD loci, highlighting the importance of genetic regulation of mRNA splicing as a molecular mechanism for CAD genetic risk. Notably, 20% and 35% of the eQTLs were unique to quiescent or proliferative SMCs, respectively. One CAD locus colocalized with a sex-specific eQTL ( TERF2IP ), and another locus colocalized with SMC-specific eQTL ( ALKBH8 ). The most significantly associated CAD locus, 9p21, was an sQTL for the long noncoding RNA CDKN2B-AS1 , also known as ANRIL , in proliferative SMCs. Conclusions: Collectively, our results provide evidence for the molecular mechanisms of genetic susceptibility to CAD in distinct SMC phenotypes. 
    more » « less
  5. Birchler, J (Ed.)
    Abstract Bidirectional flow of information shapes the outcome of the host–pathogen interactions and depends on the genetics of each organism. Recent work has begun to use co-transcriptomic studies to shed light on this bidirectional flow, but it is unclear how plastic the co-transcriptome is in response to genetic variation in both the host and pathogen. To study co-transcriptome plasticity, we conducted transcriptomics using natural genetic variation in the pathogen, Botrytis cinerea, and large-effect genetic variation abolishing defense signaling pathways within the host, Arabidopsis thaliana. We show that genetic variation in the pathogen has a greater influence on the co-transcriptome than mutations that abolish defense signaling pathways in the host. Genome-wide association mapping using the pathogens’ genetic variation and both organisms’ transcriptomes allowed an assessment of how the pathogen modulates plasticity in response to the host. This showed that the differences in both organism's responses were linked to trans-expression quantitative trait loci (eQTL) hotspots within the pathogen's genome. These hotspots control gene sets in either the host or pathogen and show differential allele sensitivity to the host’s genetic variation rather than qualitative host specificity. Interestingly, nearly all the trans-eQTL hotspots were unique to the host or pathogen transcriptomes. In this system of differential plasticity, the pathogen mediates the shift in the co-transcriptome more than the host. 
    more » « less