skip to main content


Title: Joint disease-specificity at the regulatory base-pair level
Abstract

Given the pleiotropic nature of coding sequences and that many loci exhibit multiple disease associations, it is within non-coding sequence that disease-specificity likely exists. Here, we focus on joint disorders, finding among replicated loci, thatGDF5exhibits over twenty distinct associations, and we identify causal variants for two of its strongest associations, hip dysplasia and knee osteoarthritis. By mapping regulatory regions in joint chondrocytes, we pinpoint two variants (rs4911178; rs6060369), on the same risk haplotype, which reside in anatomical site-specific enhancers. We show that both variants have clinical relevance, impacting disease by altering morphology. By modeling each variant in humanized mice, we observe joint-specific response, correlating withGDF5expression. Thus, we uncouple separate regulatory variants on a common risk haplotype that cause joint-specific disease. By broadening our perspective, we finally find that patterns of modularity atGDF5are also found at over three-quarters of loci with multiple GWAS disease associations.

 
more » « less
NSF-PAR ID:
10271802
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
12
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P < 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P < 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability. 
    more » « less
  2. Abstract

    Genome‐wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome‐wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait‐associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework,iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data.iFunMedextends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant‐level summary statistics. Data‐driven computational experiments convey how informative annotations improve single‐nucleotide polymorphism (SNP) selection performance while emphasizing robustness ofiFunMedto noninformative annotations. Application to Framingham Heart Study data indicates thatiFunMedis able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.

     
    more » « less
  3. Abstract

    The fire antSolenopsis invictaexists in two alternate social forms: monogyne nests contain a single reproductive queen and polygyne nests contain multiple reproductive queens. This colony‐level social polymorphism corresponds with individual differences in queen physiology, queen dispersal patterns and worker discrimination behaviours, all evidently regulated by an inversion‐based supergene that spans more than 13 Mb of a “social chromosome,” contains over 400 protein‐coding genes and rarely undergoes recombination. The specific mechanisms by which this supergene influences expression of the many distinctive features that characterize the alternate forms remain almost wholly unknown. To advance our understanding of these mechanisms, we explore the effects of social chromosome genotype and natal colony social form on gene expression in queens sampled as they embarked on nuptial flights, using RNA‐sequencing of brains and ovaries. We observe a large effect of natal social form, that is, of the social/developmental environment, on gene expression profiles, with similarly substantial effects of genotype, including: (a) supergene‐associated gene upregulation, (b) allele‐specific expression and (c) pronounced extra‐supergenetrans‐regulatory effects. These findings, along with observed spatial variation in differential and allele‐specific expression within the supergene region, highlight the complex gene regulatory landscape that emerged following divergence of the inversion‐mediatedSbhaplotype from its homologue, which presumably largely retained the ancestral gene order. The distinctive supergene‐associated gene expression trajectories we document at the onset of a queen’s reproductive life expand the known record of relevant molecular correlates of a complex social polymorphism and point to putative genetic factors underpinning the alternate social syndromes.

     
    more » « less
  4. The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes. 
    more » « less
  5. Individuals infected with the SARS-CoV-2 virus present with a wide variety of symptoms ranging from asymptomatic to severe and even lethal outcomes. Past research has revealed a genetic haplotype on chromosome 3 that entered the human population via introgression from Neanderthals as the strongest genetic risk factor for the severe response to COVID-19. However, the specific variants along this introgressed haplotype that contribute to this risk and the biological mechanisms that are involved remain unclear. Here, we assess the variants present on the risk haplotype for their likelihood of driving the genetic predisposition to severe COVID-19 outcomes. We do this by first exploring their impact on the regulation of genes involved in COVID-19 infection using a variety of population genetics and functional genomics tools. We then perform a locus-specific massively parallel reporter assay to individually assess the regulatory potential of each allele on the haplotype in a multipotent immune-related cell line. We ultimately reduce the set of over 600 linked genetic variants to identify four introgressed alleles that are strong functional candidates for driving the association between this locus and severe COVID-19. Using reporter assays in the presence/absence of SARS-CoV-2 , we find evidence that these variants respond to viral infection. These variants likely drive the locus’ impact on severity by modulating the regulation of two critical chemokine receptor genes: CCR1 and CCR5 . These alleles are ideal targets for future functional investigations into the interaction between host genomics and COVID-19 outcomes. 
    more » « less