skip to main content

Title: How Well Can Multivariate and Univariate GWAS Distinguish Between True and Spurious Pleiotropy?
Quantification of the simultaneous contributions of loci to multiple traits, a phenomenon called pleiotropy, is facilitated by the increased availability of high-throughput genotypic and phenotypic data. To understand the prevalence and nature of pleiotropy, the ability of multivariate and univariate genome-wide association study (GWAS) models to distinguish between pleiotropic and non-pleiotropic loci in linkage disequilibrium (LD) first needs to be evaluated. Therefore, we used publicly available maize and soybean genotypic data to simulate multiple pairs of traits that were either (i) controlled by quantitative trait nucleotides (QTNs) on separate chromosomes, (ii) controlled by QTNs in various degrees of LD with each other, or (iii) controlled by a single pleiotropic QTN. We showed that multivariate GWAS could not distinguish between QTNs in LD and a single pleiotropic QTN. In contrast, a unique QTN detection rate pattern was observed for univariate GWAS whenever the simulated QTNs were in high LD or pleiotropic. Collectively, these results suggest that multivariate and univariate GWAS should both be used to infer whether or not causal mutations underlying peak GWAS associations are pleiotropic. Therefore, we recommend that future studies use a combination of multivariate and univariate GWAS models, as both models could be useful for identifying and more » narrowing down candidate loci with potential pleiotropic effects for downstream biological experiments. « less
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Frontiers in Genetics
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Maize inflorescence is a complex phenotype that involves the physical and developmental interplay of multiple traits. Given the evidence that genes could pleiotropically contribute to several of these traits, we used publicly available maize data to assess the ability of multivariate genome-wide association study (GWAS) approaches to identify pleiotropic quantitative trait loci (pQTL). Our analysis of 23 publicly available inflorescence and leaf-related traits in a diversity panel of n = 281 maize lines genotyped with 376,336 markers revealed that the two multivariate GWAS approaches we tested were capable of identifying pQTL in genomic regions coinciding with similar associations found in previousmore »studies. We then conducted a parallel simulation study on the same individuals, where it was shown that multivariate GWAS approaches yielded a higher true-positive quantitative trait nucleotide (QTN) detection rate than comparable univariate approaches for all evaluated simulation settings except for when the correlated simulated traits had a heritability of 0.9. We therefore conclude that the implementation of state-of-the-art multivariate GWAS approaches is a useful tool for dissecting pleiotropy and their more widespread implementation could facilitate the discovery of genes and other biological mechanisms underlying maize inflorescence.« less
  2. Abstract Background Advances in genotyping and phenotyping techniques have enabled the acquisition of a great amount of data. Consequently, there is an interest in multivariate statistical analyses that identify genomic regions likely to contain causal mutations affecting multiple traits (i.e., pleiotropy). As the demand for multivariate analyses increases, it is imperative that optimal tools are available to assess their performance. To facilitate the testing and validation of these multivariate approaches, we developed simplePHENOTYPES, an R/CRAN package that simulates pleiotropy, partial pleiotropy, and spurious pleiotropy in a wide range of genetic architectures, including additive, dominance and epistatic models. Results We illustratemore »simplePHENOTYPES’ ability to simulate thousands of phenotypes in less than one minute. We then provide two vignettes illustrating how to simulate sets of correlated traits in simplePHENOTYPES. Finally, we demonstrate the use of results from simplePHENOTYPES in a standard GWAS software, as well as the equivalence of simulated phenotypes from simplePHENOTYPES and other packages with similar capabilities. Conclusions simplePHENOTYPES is a R/CRAN package that makes it possible to simulate multiple traits controlled by loci with varying degrees of pleiotropy. Its ability to interface with both commonly-used marker data formats and downstream quantitative genetics software and packages should facilitate a rigorous assessment of both existing and emerging statistical GWAS and GS approaches. simplePHENOTYPES is also available at .« less
  3. Pleiotropy, the concept that a single gene controls multiple distinct traits, is prevalent in most organisms and has broad implications for medicine and agriculture. The identification of the molecular mechanisms underlying pleiotropy has the power to reveal previously unknown biological connections between seemingly unrelated traits. Additionally, the discovery of pleiotropic genes increases our understanding of both genetic and phenotypic complexity by characterizing novel gene functions. Quantitative trait locus (QTL) mapping has been used to identify several pleiotropic regions in many organisms. However, gene knockout studies are needed to eliminate the possibility of tightly linked, non-pleiotropic loci. Here, we use amore »panel of 296 recombinant inbred advanced intercross lines of Caenorhabditis elegans and a high-throughput fitness assay to identify a single large-effect QTL on the center of chromosome V associated with variation in responses to eight chemotherapeutics. We validate this QTL with near-isogenic lines and pair genome-wide gene expression data with drug response traits to perform mediation analysis, leading to the identification of a pleiotropic candidate gene, scb-1 for some of the eight chemotherapeutics. Using deletion strains created by genome editing, we show that scb-1 , which was previously implicated in response to bleomycin, also underlies responses to other double-strand DNA break-inducing chemotherapeutics. This finding provides new evidence for the role of scb-1 in the nematode drug response and highlights the power of mediation analysis to identify causal genes.« less
  4. Brain imaging genetics studies the genetic basis of brain structures and functionalities via integrating genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). In this area, both multi-task learning (MTL) and sparse canonical correlation analysis (SCCA) methods are widely used since they are superior to those independent and pairwise univariate analysis. MTL methods generally incorporate a few of QTs and could not select features from multiple QTs; while SCCA methods typically employ one modality of QTs to study its association with SNPs. Both MTL and SCCA are computational expensive as the number of SNPs increases. Inmore »this paper, we propose a novel multi-task SCCA (MTSCCA) method to identify bi-multivariate associations between SNPs and multi-modal imaging QTs. MTSCCA could make use of the complementary information carried by different imaging modalities. MTSCCA enforces sparsity at the group level via the G2,1-norm, and jointly selects features across multiple tasks for SNPs and QTs via the L2,1-norm. A fast optimization algorithm is proposed using the grouping information of SNPs. Compared with conventional SCCA methods, MTSCCA obtains better correlation coefficients and canonical weights patterns. In addition, MTSCCA runs very fast and easy-to-implement, indicating its potential power in genome-wide brain-wide imaging genetics.« less
  5. Schwartz, Russell (Ed.)
    Abstract Motivation While gene–environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can be improved by simultaneously analyzing multiple related phenotypes. For a univariate phenotype, two-step methods produce higher power for detecting a GxE interaction compared to single step analysis. Therefore, we propose a two-step approach to test for an overall GxE effect for multiple phenotypes. Resultsmore »Using simulations we demonstrate that, when more than one phenotype has GxE effect (i.e. GxE pleiotropy), our approach offers substantial gain in power (18–43%) to detect an aggregate-level GxE effect for a multivariate phenotype compared to an analogous two-step method to identify GxE effect for a univariate phenotype. We applied the proposed approach to simultaneously analyze three lipids, LDL, HDL and Triglyceride with the frequency of alcohol consumption as environmental factor in the UK Biobank. The method identified two loci with an overall GxE effect on the vector of lipids, one of which was missed by the competing approaches. Availability and implementation We provide an R package MPGE implementing the proposed approach which is available from CRAN: Supplementary information Supplementary data are available at Bioinformatics online.« less