skip to main content


Title: A scalable estimator of SNP heritability for biobank-scale data
Abstract Motivation

Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens.

Results

We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log⁡3N, log⁡3M)).

We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min.

Availability and implementation

The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.

 
more » « less
Award ID(s):
1705121
NSF-PAR ID:
10413672
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
34
Issue:
13
ISSN:
1367-4803
Page Range / eLocation ID:
p. i187-i194
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.

     
    more » « less
  2. Abstract

    Genetic composition can influence host susceptibility to, and transmission of, pathogens, with potential population‐level consequences. In bighorn sheep (Ovis canadensis), pneumonia epidemics caused byMycoplasma ovipneumoniaehave been associated with severe population declines and limited recovery across North America. Adult survivors either clear the infection or act as carriers that continually shedM. ovipneumoniaeand expose their susceptible offspring, resulting in high rates of lamb mortality for years following the outbreak event. Here, we investigated the influence of genomic composition on persistent carriage ofM. ovipneumoniaein a well‐studied bighorn sheep herd in the Wallowa Mountains of Oregon, USA. Using 10,605 SNPs generated using RADseq technology for 25 female bighorn sheep, we assessed genomic diversity metrics and employed family‐based genome‐wide association methodologies to understand variant association and genetic architecture underlying chronic carriage. We observed no differences among genome‐wide diversity metrics (heterozygosity and allelic richness) between groups. However, we identified two variant loci of interest and seven associated candidate genes, which may influence carriage status. Further, we found that the SNP panel explained ~55% of the phenotypic variance (SNP‐based heritability) forM. ovipneumoniaecarriage, though there was considerable uncertainty in these estimates. While small sample sizes limit conclusions drawn here, our study represents one of the first to assess the genomic factors influencing chronic carriage of a pathogen in a wild population and lays a foundation for understanding genomic influence on pathogen persistence in bighorn sheep and other wildlife populations. Future research should incorporate additional individuals as well as distinct herds to further explore the genomic basis of chronic carriage.

     
    more » « less
  3. Abstract

    Genome‐wide association studies (GWAS) of alcohol dependence (AD) have reliably identified variation within alcohol metabolizing genes (eg,ADH1B) but have inconsistently located other signals, which may be partially attributable to symptom heterogeneity underlying the disorder. We conducted GWAS of DSM‐IV AD (primary analysis), DSM‐IV AD criterion count (secondary analysis), and individual dependence criteria (tertiary analysis) among 7418 (1121 families) European American (EA) individuals from the Collaborative Study on the Genetics of Alcoholism (COGA). Trans‐ancestral meta‐analyses combined these results with data from 3175 (585 families) African‐American (AA) individuals from COGA. In the EA GWAS, three loci were genome‐wide significant: rs1229984 inADH1Bfor AD criterion count (P= 4.16E−11) andDesire to cut drinking(P= 1.21E−11); rs188227250 (chromosome 8,Drinking more than intended,P= 6.72E−09); rs1912461 (chromosome 15,Time spent drinking,P= 1.77E−08). In the trans‐ancestral meta‐analysis, rs1229984 was associated with multiple phenotypes and two additional loci were genome‐wide significant: rs61826952 (chromosome 1, DSM‐IV AD,P= 8.42E−11); rs7597960 (chromosome 2,Time spent drinking,P= 1.22E−08). Associations with rs1229984 and rs18822750 were replicated in independent datasets. Polygenic risk scores derived from the EA GWAS of AD predicted AD in two EA datasets (P < .01; 0.61%‐1.82% of variance). Identified novel variants (ie, rs1912461, rs61826952) were associated with differential central evoked theta power (loss − gain;P= .0037) and reward‐related ventral striatum reactivity (P= .008), respectively. This study suggests that studying individual criteria may unveil new insights into the genetic etiology of AD liability.

     
    more » « less
  4. Abstract Background

    Genome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNPs) that cause observed phenotypes. However, with highly correlated SNPs, correlated observations, and the number of SNPs being two orders of magnitude larger than the number of observations, GWAS procedures often suffer from high false positive rates.

    Results

    We propose BGWAS, a novel Bayesian variable selection method based on nonlocal priors for linear mixed models specifically tailored for genome-wide association studies. Our proposed method BGWAS uses a novel nonlocal prior for linear mixed models (LMMs). BGWAS has two steps: screening and model selection. The screening step scans through all the SNPs fitting one LMM for each SNP and then uses Bayesian false discovery control to select a set of candidate SNPs. After that, a model selection step searches through the space of LMMs that may have any number of SNPs from the candidate set. A simulation study shows that, when compared to popular GWAS procedures, BGWAS greatly reduces false positives while maintaining the same ability to detect true positive SNPs. We show the utility and flexibility of BGWAS with two case studies: a case study on salt stress in plants, and a case study on alcohol use disorder.

    Conclusions

    BGWAS maintains and in some cases increases the recall of true SNPs while drastically lowering the number of false positives compared to popular SMA procedures.

     
    more » « less
  5. Abstract

    Heritability is well documented for psychiatric disorders and cognitive abilities which are, however, complex, involving both genetic and environmental factors. Hence, it remains challenging to discover which and how genetic variations contribute to such complex traits. In this article, they propose to use mediation analysis to bridge this gap, where neuroimaging phenotypes were utilized as intermediate variables. The Philadelphia Neurodevelopmental Cohort was investigated using genome‐wide association studies (GWAS) and mediation analyses. Specifically, 951 participants were included with age ranging from 8 to 21 years. Two hundred and four neuroimaging measures were extracted from structural magnetic resonance imaging scans. GWAS were conducted for each measure to evaluate the SNP‐based heritability. Furthermore, mediation analyses were employed to understand the mechanisms in which genetic variants have influence on pathological behaviors implicitly through neuroimaging phenotypes, and identified SNPs that would not be detected otherwise. Our analyses found that rs10494561, located in the intron region within NMNAT2, was associated with the severity of the prodromal symptoms of psychosis implicitly, mediated through the volume of the left hemisphere of the superior frontal region (). The gene NMNAT2 is known to be associated with brainstem degeneration, and produce cytoplasmic enzyme which is mainly expressed in the brain. Another SNP rs2285351 was found in the intron region of gene IFT122 which may be potentially associated with human spatial orientation ability through the area of the left hemisphere of the isthmuscingulate region ().Hum Brain Mapp 38:4088–4097, 2017. ©2017 Wiley Periodicals, Inc.

     
    more » « less