skip to main content

Title: Multi-resolution localization of causal variants across the genome

In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report onKnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we applyKnockoffZoomto data from 350k subjects in the UK Biobank and report many new findings.

; ; ; ;
Award ID(s):
1934578 1712800
Publication Date:
Journal Name:
Nature Communications
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Extant conifer species may be susceptible to rapid environmental change owing to their long generation times, but could also be resilient due to high levels of standing genetic diversity. Hybridisation between closely related species can increase genetic diversity and generate novel allelic combinations capable of fuelling adaptive evolution. Our study unravelled the genetic architecture of adaptive evolution in a conifer hybrid zone formed betweenPinus strobiformisandP. flexilis. Using a multifaceted approach emphasising the spatial and environmental patterns of linkage disequilibrium and ancestry enrichment, we identified recently introgressed and background genetic variants to be driving adaptive evolution along different environmental gradients.more »Specifically, recently introgressed variants fromP. flexiliswere favoured along freeze-related environmental gradients, while background variants were favoured along water availability-related gradients. We posit that such mosaics of allelic variants within conifer hybrid zones will confer upon them greater resilience to ongoing and future environmental change and can be a key resource for conservation efforts.

    « less
  2. ABSTRACT Genome-wide association studies (GWAS) can identify genetic variants responsible for naturally occurring and quantitative phenotypic variation. Association studies therefore provide a powerful complement to approaches that rely on de novo mutations for characterizing gene function. Although bacteria should be amenable to GWAS, few GWAS have been conducted on bacteria, and the extent to which nonindependence among genomic variants (e.g., linkage disequilibrium [LD]) and the genetic architecture of phenotypic traits will affect GWAS performance is unclear. We apply association analyses to identify candidate genes underlying variation in 20 biochemical, growth, and symbiotic phenotypes among 153 strains of Ensifer meliloti .more »For 11 traits, we find genotype-phenotype associations that are stronger than expected by chance, with the candidates in relatively small linkage groups, indicating that LD does not preclude resolving association candidates to relatively small genomic regions. The significant candidates show an enrichment for nucleotide polymorphisms (SNPs) over gene presence-absence variation (PAV), and for five traits, candidates are enriched in large linkage groups, a possible signature of epistasis. Many of the variants most strongly associated with symbiosis phenotypes were in genes previously identified as being involved in nitrogen fixation or nodulation. For other traits, apparently strong associations were not stronger than the range of associations detected in permuted data. In sum, our data show that GWAS in bacteria may be a powerful tool for characterizing genetic architecture and identifying genes responsible for phenotypic variation. However, careful evaluation of candidates is necessary to avoid false signals of association. IMPORTANCE Genome-wide association analyses are a powerful approach for identifying gene function. These analyses are becoming commonplace in studies of humans, domesticated animals, and crop plants but have rarely been conducted in bacteria. We applied association analyses to 20 traits measured in Ensifer meliloti , an agriculturally and ecologically important bacterium because it fixes nitrogen when in symbiosis with leguminous plants. We identified candidate alleles and gene presence-absence variants underlying variation in symbiosis traits, antibiotic resistance, and use of various carbon sources; some of these candidates are in genes previously known to affect these traits whereas others were in genes that have not been well characterized. Our results point to the potential power of association analyses in bacteria, but also to the need to carefully evaluate the potential for false associations.« less
  3. Abstract

    Uncovering whether convergent adaptations share a genetic basis is consequential for understanding the evolution of phenotypic diversity. This information can help us understand the extent to which shared ancestry or independent evolution shape adaptive phenotypes. In this study, we first ask whether the same genes underlie polymorphic mimicry inPapilioswallowtail butterflies. By comparing signatures of genetic variation between polymorphic and monomorphic species, we then investigate how ancestral variation, hybridization, and independent evolution contributed to wing pattern diversity in this group. We report that a single gene,doublesex (dsx), controls mimicry across multiple taxa, but with species-specific patterns of genetic differentiation andmore »linkage disequilibrium. In contrast to widespread examples of phenotypic evolution driven by introgression, our analyses reveal distinct mimicry alleles. We conclude that mimicry evolution in this group was likely facilitated by ancestral polymorphism resulting from early co-option ofdsxas a mimicry locus, and that evolutionary turnover ofdsxalleles may underlie the wing pattern diversity of extant polymorphic and monomorphic lineages.

    « less
  4. Abstract Rationale: Genetic variation has a substantial contribution to chronic obstructive pulmonary disease (COPD) and lung function measurements. Heritability estimates using genome-wide genotyping data can be biased if analyses do not appropriately account for the nonuniform distribution of genetic effects across the allele frequency and linkage disequilibrium (LD) spectrum. In addition, the contribution of rare variants has been unclear. Objectives: We sought to assess the heritability of COPD and lung function using whole-genome sequence data from the Trans-Omics for Precision Medicine program. Methods: Using the genome-based restricted maximum likelihood method, we partitioned the genome into bins based on minor allelemore »frequency and LD scores and estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio in 11 051 European ancestry and 5853 African-American participants. Measurements and Main Results: In European ancestry participants, the estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio were 35.5%, 55.6% and 32.5%, of which 18.8%, 19.7%, 17.8% were from common variants, and 16.6%, 35.8%, and 14.6% were from rare variants. These estimates had wide confidence intervals, with common variants and some sets of rare variants showing a statistically significant contribution (P-value < 0.05). In African-Americans, common variant heritability was similar to European ancestry participants, but lower sample size precluded calculation of rare variant heritability. Conclusions: Our study provides updated and unbiased estimates of heritability for COPD and lung function, and suggests an important contribution of rare variants. Larger studies of more diverse ancestry will improve accuracy of these estimates.« less
  5. Abstract

    While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and lowmore »frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.

    « less