Neuropsychiatric disorders afflict a large portion of the global population and constitute a significant source of disability worldwide. Although Genome-wide Association Studies (GWAS) have identified many disorder-associated variants, the underlying regulatory mechanisms linking them to disorders remain elusive, especially those involving distant genomic elements. Expression quantitative trait loci (eQTLs) constitute a powerful means of providing this missing link. However, most eQTL studies in human brains have focused exclusively on cis-eQTLs, which link variants to nearby genes (i.e., those within 1 Mb of a variant). A complete understanding of disease etiology requires a clearer understanding of trans-regulatory mechanisms, which, in turn, entails a detailed analysis of the relationships between variants and expression changes in distant genes.
By leveraging large datasets from the PsychENCODE consortium, we conducted a genome-wide survey of trans-eQTLs in the human dorsolateral prefrontal cortex. We also performed colocalization and mediation analyses to identify mediators in trans-regulation and use trans-eQTLs to link GWAS loci to schizophrenia risk genes.
We identified ~80,000 candidate trans-eQTLs (at FDR<0.25) that influence the expression of ~10K target genes (i.e., “trans-eGenes”). We found that many variants associated with these candidate trans-eQTLs overlap with known cis-eQTLs. Moreover, for >60% of these variants (by colocalization), the cis-eQTL’s target gene acts as a mediator for the trans-eQTL SNP's effect on the trans-eGene, highlighting examples of cis-mediation as essential for trans-regulation. Furthermore, many of these colocalized variants fall into a discernable pattern wherein cis-eQTL’s target is a transcription factor or RNA-binding protein, which, in turn, targets the gene associated with the candidate trans-eQTL. Finally, we show that trans-regulatory mechanisms provide valuable insights into psychiatric disorders: beyond what had been possible using only cis-eQTLs, we link an additional 23 GWAS loci and 90 risk genes (using colocalization between candidate trans-eQTLs and schizophrenia GWAS loci).
We demonstrate that the transcriptional architecture of the human brain is orchestrated by both cis- and trans-regulatory variants and found that trans-eQTLs provide insights into brain-disease biology.
- NSF-PAR ID:
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Genome Medicine
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks.more » « less
There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues.
We introduce a meta-analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta-analyses.
Availability and Implementation
Source code is at https://github.com/datduong/RECOV.
Supplementary data are available at Bioinformatics online.
The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes.more » « less
Genome‐wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome‐wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait‐associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework,
iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMedextends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant‐level summary statistics. Data‐driven computational experiments convey how informative annotations improve single‐nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMedto noninformative annotations. Application to Framingham Heart Study data indicates that iFunMedis able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.
Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models.
To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype–phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes.
We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer’s disease).
We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.