Abstract MotivationHeritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. ResultsWe propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log3N, log3M)).We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. Availability and implementationThe RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.
more »
« less
Scalable summary-statistics-based heritability estimation method with individual genotype level accuracy
SNP heritability, the proportion of phenotypic variation explained by genotyped SNPs, is an important parameter in understanding the genetic architecture underlying various diseases and traits. Methods that aim to estimate SNP heritability from individual genotype and phenotype data are limited by their ability to scale to Biobank-scale data sets and by the restrictions in access to individual-level data. These limitations have motivated the development of methods that only require summary statistics. Although the availability of publicly accessible summary statistics makes them widely applicable, these methods lack the accuracy of methods that utilize individual genotypes. Here we present a SUMmary-statistics-based Randomized Haseman-Elston regression (SUM-RHE), a method that can estimate the SNP heritability of complex phenotypes with accuracies comparable to approaches that require individual genotypes, while exclusively relying on summary statistics. SUM-RHE employs Genome-Wide Association Study (GWAS) summary statistics and statistics obtained on a reference population, which can be efficiently estimated and readily shared for public use. Our results demonstrate that SUM-RHE obtains estimates of SNP heritability that are substantially more accurate compared with other summary statistic methods and on par with methods that rely on individual-level data.
more »
« less
- Award ID(s):
- 1943497
- PAR ID:
- 10611283
- Publisher / Repository:
- Cold Spring Harbor Laboratory Press
- Date Published:
- Journal Name:
- Genome Research
- Volume:
- 34
- Issue:
- 9
- ISSN:
- 1088-9051
- Page Range / eLocation ID:
- 1286 to 1293
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.more » « less
-
Summary Macroorganisms’ genotypes shape their phenotypes, which in turn shape the habitat available to potential microbial symbionts. This influence of host genotype on microbiome composition has been demonstrated in many systems; however, most previous studies have either compared unrelated genotypes or delved into molecular mechanisms. As a result, it is currently unclear whether the heritability of host‐associated microbiomes follows similar patterns to the heritability of other complex traits.We take a new approach to this question by comparing the microbiomes of diverse maize inbred lines and their F1hybrid offspring, which we quantified in both rhizosphere and leaves of field‐grown plants using 16S‐v4 and ITS1 amplicon sequencing.We show that inbred lines and hybrids differ consistently in the composition of bacterial and fungal rhizosphere communities, as well as leaf‐associated fungal communities. A wide range of microbiome features display heterosis within individual crosses, consistent with patterns for nonmicrobial maize phenotypes. For leaf microbiomes, these results were supported by the observation that broad‐sense heritability in hybrids was substantially higher than narrow‐sense heritability.Our results support our hypothesis that at least some heterotic host traits affect microbiome composition in maize.more » « less
-
Combining SNP p -values from GWAS summary data is a promising strategy for detecting novel genetic factors. Existing statistical methods for the p -value-based SNP-set testing confront two challenges. First, the statistical power of different methods depends on unknown patterns of genetic effects that could drastically vary over different SNP sets. Second, they do not identify which SNPs primarily contribute to the global association of the whole set. We propose a new signal-adaptive analysis pipeline to address these challenges using the omnibus thresholding Fisher’s method (oTFisher). The oTFisher remains robustly powerful over various patterns of genetic effects. Its adaptive thresholding can be applied to estimate important SNPs contributing to the overall significance of the given SNP set. We develop efficient calculation algorithms to control the type I error rate, which accounts for the linkage disequilibrium among SNPs. Extensive simulations show that the oTFisher has robustly high power and provides a higher balanced accuracy in screening SNPs than the traditional Bonferroni and FDR procedures. We applied the oTFisher to study the genetic association of genes and haplotype blocks of the bone density-related traits using the summary data of the Genetic Factors for Osteoporosis Consortium. The oTFisher identified more novel and literature-reported genetic factors than existing p -value combination methods. Relevant computation has been implemented into the R package TFisher to support similar data analysis.more » « less
-
SUMMARY In this study, we characterized a panel of 1264 maize near‐isogenic lines (NILs), developed from crosses between 18 diverse inbred lines and the recurrent parent B73, referred to as nested NILs (nNILs). In this study, 888 of the nNILs were genotyped using genotyping‐by‐sequencing (GBS). Subsequently, 24 of these nNILs, and all the parental lines, were re‐genotyped using a high‐density single nucleotide polymorphism (SNP) chip. A novel pipeline for calling introgressions, which does not rely on knowing the donor parent of each nNIL, was developed based on a hidden Markov model (HMM) algorithm. By comparing the introgressions detected using GBS data with those identified using chip data, we optimized the HMM parameters for analyzing the entire nNIL population. A total of 2969 introgressions were identified across the 888 nNILs. Individual introgression blocks ranged from 21 bp to 204 Mbp, with an average size of 17 Mbp. By comparing SNP genotypes within introgressed segments to the known genotypes of the donor lines, we determined that in about one third of the lines, the identity of the donors did not match expectation based on their pedigrees. We characterized the entire nNIL population for three foliar diseases. Using these data, we mapped a number of quantitative trait loci (QTL) for disease resistance in the nNIL population and observed extensive variation in effects among the alleles from different donor parents at most QTL identified. This population will be of significant utility for dissecting complex agronomic traits and allelic series in maize.more » « less
An official website of the United States government

