skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leveraging pleiotropy for joint analysis of genome-wide association studies with per trait interpretations
We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.  more » « less
Award ID(s):
1910885
PAR ID:
10469977
Author(s) / Creator(s):
; ;
Editor(s):
Epstein, Michael P.
Publisher / Repository:
PLoS
Date Published:
Journal Name:
PLOS Genetics
Volume:
18
Issue:
11
ISSN:
1553-7404
Page Range / eLocation ID:
e1010447
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationMany variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time. ResultsHere, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci. Availability and implementationOur source code is available at https://github.com/lgai/CONFIT. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract The genome‐wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type‐I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective‐risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption‐free model built upon data‐consistent inversion (DCI), which is a recently developed measure‐theoretic framework utilized for uncertainty quantification. This proposed DCI‐derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI‐derived model to cover all genetic variants without emphasizing on additivity of the classic‐GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type‐I error rate at least as well as the classic‐GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission. 
    more » « less
  3. Abstract Maize inflorescence is a complex phenotype that involves the physical and developmental interplay of multiple traits. Given the evidence that genes could pleiotropically contribute to several of these traits, we used publicly available maize data to assess the ability of multivariate genome-wide association study (GWAS) approaches to identify pleiotropic quantitative trait loci (pQTL). Our analysis of 23 publicly available inflorescence and leaf-related traits in a diversity panel of n = 281 maize lines genotyped with 376,336 markers revealed that the two multivariate GWAS approaches we tested were capable of identifying pQTL in genomic regions coinciding with similar associations found in previous studies. We then conducted a parallel simulation study on the same individuals, where it was shown that multivariate GWAS approaches yielded a higher true-positive quantitative trait nucleotide (QTN) detection rate than comparable univariate approaches for all evaluated simulation settings except for when the correlated simulated traits had a heritability of 0.9. We therefore conclude that the implementation of state-of-the-art multivariate GWAS approaches is a useful tool for dissecting pleiotropy and their more widespread implementation could facilitate the discovery of genes and other biological mechanisms underlying maize inflorescence. 
    more » « less
  4. ABSTRACT Genome-wide association studies (GWAS) can identify genetic variants responsible for naturally occurring and quantitative phenotypic variation. Association studies therefore provide a powerful complement to approaches that rely on de novo mutations for characterizing gene function. Although bacteria should be amenable to GWAS, few GWAS have been conducted on bacteria, and the extent to which nonindependence among genomic variants (e.g., linkage disequilibrium [LD]) and the genetic architecture of phenotypic traits will affect GWAS performance is unclear. We apply association analyses to identify candidate genes underlying variation in 20 biochemical, growth, and symbiotic phenotypes among 153 strains of Ensifer meliloti . For 11 traits, we find genotype-phenotype associations that are stronger than expected by chance, with the candidates in relatively small linkage groups, indicating that LD does not preclude resolving association candidates to relatively small genomic regions. The significant candidates show an enrichment for nucleotide polymorphisms (SNPs) over gene presence-absence variation (PAV), and for five traits, candidates are enriched in large linkage groups, a possible signature of epistasis. Many of the variants most strongly associated with symbiosis phenotypes were in genes previously identified as being involved in nitrogen fixation or nodulation. For other traits, apparently strong associations were not stronger than the range of associations detected in permuted data. In sum, our data show that GWAS in bacteria may be a powerful tool for characterizing genetic architecture and identifying genes responsible for phenotypic variation. However, careful evaluation of candidates is necessary to avoid false signals of association. IMPORTANCE Genome-wide association analyses are a powerful approach for identifying gene function. These analyses are becoming commonplace in studies of humans, domesticated animals, and crop plants but have rarely been conducted in bacteria. We applied association analyses to 20 traits measured in Ensifer meliloti , an agriculturally and ecologically important bacterium because it fixes nitrogen when in symbiosis with leguminous plants. We identified candidate alleles and gene presence-absence variants underlying variation in symbiosis traits, antibiotic resistance, and use of various carbon sources; some of these candidates are in genes previously known to affect these traits whereas others were in genes that have not been well characterized. Our results point to the potential power of association analyses in bacteria, but also to the need to carefully evaluate the potential for false associations. 
    more » « less
  5. Combining SNP p -values from GWAS summary data is a promising strategy for detecting novel genetic factors. Existing statistical methods for the p -value-based SNP-set testing confront two challenges. First, the statistical power of different methods depends on unknown patterns of genetic effects that could drastically vary over different SNP sets. Second, they do not identify which SNPs primarily contribute to the global association of the whole set. We propose a new signal-adaptive analysis pipeline to address these challenges using the omnibus thresholding Fisher’s method (oTFisher). The oTFisher remains robustly powerful over various patterns of genetic effects. Its adaptive thresholding can be applied to estimate important SNPs contributing to the overall significance of the given SNP set. We develop efficient calculation algorithms to control the type I error rate, which accounts for the linkage disequilibrium among SNPs. Extensive simulations show that the oTFisher has robustly high power and provides a higher balanced accuracy in screening SNPs than the traditional Bonferroni and FDR procedures. We applied the oTFisher to study the genetic association of genes and haplotype blocks of the bone density-related traits using the summary data of the Genetic Factors for Osteoporosis Consortium. The oTFisher identified more novel and literature-reported genetic factors than existing p -value combination methods. Relevant computation has been implemented into the R package TFisher to support similar data analysis. 
    more » « less