skip to main content


Title: Leveraging pleiotropy for joint analysis of genome-wide association studies with per trait interpretations

We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect.

Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.

 
more » « less
Award ID(s):
1910885
NSF-PAR ID:
10469977
Author(s) / Creator(s):
; ;
Editor(s):
Epstein, Michael P.
Publisher / Repository:
PLoS
Date Published:
Journal Name:
PLOS Genetics
Volume:
18
Issue:
11
ISSN:
1553-7404
Page Range / eLocation ID:
e1010447
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Genome‐wide association studies (GWAS) have succeeded in identifying tens of thousands of genetic variants associated with complex human traits during the past decade, however, they are still hampered by limited statistical power and difficulties in biological interpretation. With the recent progress in expression quantitative trait loci (eQTL) studies, transcriptome‐wide association studies (TWAS) provide a framework to test for gene‐trait associations by integrating information from GWAS and eQTL studies.

    Results

    In this review, we will introduce the general framework of TWAS, the relevant resources, and the computational tools. Extensions of the original TWAS methods will also be discussed. Furthermore, we will briefly introduce methods that are closely related to TWAS, including MR‐based methods and colocalization approaches. Connection and difference between these approaches will be discussed.

    Conclusion

    Finally, we will summarize strengths, limitations, and potential directions for TWAS.

     
    more » « less
  2. Abstract Motivation

    Many variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time.

    Results

    Here, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci.

    Availability and implementation

    Our source code is available at https://github.com/lgai/CONFIT.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract

    The genome‐wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type‐I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective‐risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption‐free model built upon data‐consistent inversion (DCI), which is a recently developed measure‐theoretic framework utilized for uncertainty quantification. This proposed DCI‐derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI‐derived model to cover all genetic variants without emphasizing on additivity of the classic‐GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type‐I error rate at least as well as the classic‐GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.

     
    more » « less
  4. Abstract Motivation

    A large proportion of risk regions identified by genome-wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases.

    Results

    In this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non-Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non-infinitesimal genetic architecture underlying each trait. We propose a Metropolis–Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and body mass index GWAS to show that it produces estimates consistent with the known genetic makeup of both traits.

    Availability and implementation

    The UNITY software is made freely available to the research community at: https://github.com/bogdanlab/UNITY.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    We present a novel approach to genome-wide association studies (GWAS) by leveraging unstructured, spoken phenotypic descriptions to identify genomic regions associated with maize traits. Utilizing the Wisconsin Diversity panel, we collected spoken descriptions of Zea mays ssp. mays traits, converting these qualitative observations into quantitative data amenable to GWAS analysis. First, we determined that visually striking phenotypes could be detected from unstructured spoken phenotypic descriptions. Next, we developed two methods to process the same descriptions to derive the trait plant height, a well-characterized phenotypic feature in maize: (1) a semantic similarity metric that assigns a score based on the resemblance of each observation to the concept of ‘tallness’ and (2) a manual scoring system that categorizes and assigns values to phrases related to plant height. Our analysis successfully corroborated known genomic associations and uncovered novel candidate genes potentially linked to plant height. Some of these genes are associated with gene ontology terms that suggest a plausible involvement in determining plant stature. This proof-of-concept demonstrates the viability of spoken phenotypic descriptions in GWAS and introduces a scalable framework for incorporating unstructured language data into genetic association studies. This methodology has the potential not only to enrich the phenotypic data used in GWAS and to enhance the discovery of genetic elements linked to complex traits but also to expand the repertoire of phenotype data collection methods available for use in the field environment.

     
    more » « less