Abstract While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.
more »
« less
Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits
The number of variants that have a non-zero effect on a trait ( i.e . polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions ( N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.
more »
« less
- PAR ID:
- 10337292
- Editor(s):
- Wheeler, Heather E.
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 17
- Issue:
- 10
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1009483
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The underlying genetic changes that regulate the appearance and disappearance of repeated traits, or serial homologs, remain poorly understood. One hypothesis is that variation in genomic regions flanking master regulatory genes, also known as input–output genes, controls variation in trait number, making the locus of evolution almost predictable. Another hypothesis implicates genetic variation in up- or downstream loci of master control genes. Here, we use the butterfly Bicyclus anynana , a species that exhibits natural variation in eyespot number on the dorsal hindwing, to test these two hypotheses. We first estimated the heritability of dorsal hindwing eyespot number by breeding multiple butterfly families differing in eyespot number and regressing eyespot numbers of offspring on midparent values. We then estimated the number and identity of independent genetic loci contributing to eyespot number variation by performing a genome-wide association study with restriction site-associated DNA sequencing from multiple individuals varying in number of eyespots sampled across a freely breeding laboratory population. We found that dorsal hindwing eyespot number has a moderately high heritability of ∼0.50 and is characterized by a polygenic architecture. Previously identified genomic regions involved in eyespot development, and novel ones, display high association with dorsal hindwing eyespot number, suggesting that homolog number variation is likely determined by regulatory changes at multiple loci that build the trait, and not by variation at single master regulators or input–output genes.more » « less
-
Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.more » « less
-
Abstract MotivationHeritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. ResultsWe propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log3N, log3M)).We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. Availability and implementationThe RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.more » « less
-
null (Ed.)High rates of dispersal can breakdown coadapted gene complexes. However, concentrated genomic architecture (i.e., genomic islands of divergence) can suppress recombination to allow evolution of local adaptations despite high gene flow. Pacific lamprey (Entosphenus tridentatus) is a highly dispersive anadromous fish. Observed trait diversity and evidence for genetic basis of traits suggests it may be locally adapted. We addressed whether concentrated genomic architecture could influence local adaptation for Pacific lamprey. Using two new whole genome assemblies and genotypes from 7,716 single nucleotide polymorphism (SNP) loci in 518 individuals from across the species range, we identified four genomic islands of divergence (on chromosomes 01, 02, 04, and 22). We determined robust phenotype-by-genotype relationships by testing multiple traits across geographic sites. These trait associations probably explain genomic divergence across the species’ range. We genotyped a subset of 302 broadly distributed SNPs in 2,145 individuals for association testing for adult body size, sexual maturity, migration distance and timing, adult swimming ability, and larval growth. Body size traits were strongly associated with SNPs on chromosomes 02 and 04. Moderate associations also implicated SNPs on chromosome 01 as being associated with variation in female maturity. Finally, we used candidate SNPs to extrapolate a heterogeneous spatiotemporal distribution of these predicted phenotypes based on independent data sets of larval and adult collections. These maturity and body size results guide future elucidation of factors driving regional optimization of these traits for fitness. Pacific lamprey is culturally important and imperiled. This research addresses biological uncertainties that challenge restoration efforts.more » « less
An official website of the United States government

