skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multivariate genome-wide association analysis by iterative hard thresholding
Abstract MotivationIn a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive. ResultsWe present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA’s linear mixed models and mv-PLINK’s canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits. Availability and implementationSoftware, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.  more » « less
Award ID(s):
2054253 2205441
PAR ID:
10503022
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Marschall, Tobias
Publisher / Repository:
Oxford Academic
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
4
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Epstein, Michael P. (Ed.)
    We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS. 
    more » « less
  2. Abstract Maize inflorescence is a complex phenotype that involves the physical and developmental interplay of multiple traits. Given the evidence that genes could pleiotropically contribute to several of these traits, we used publicly available maize data to assess the ability of multivariate genome-wide association study (GWAS) approaches to identify pleiotropic quantitative trait loci (pQTL). Our analysis of 23 publicly available inflorescence and leaf-related traits in a diversity panel of n = 281 maize lines genotyped with 376,336 markers revealed that the two multivariate GWAS approaches we tested were capable of identifying pQTL in genomic regions coinciding with similar associations found in previous studies. We then conducted a parallel simulation study on the same individuals, where it was shown that multivariate GWAS approaches yielded a higher true-positive quantitative trait nucleotide (QTN) detection rate than comparable univariate approaches for all evaluated simulation settings except for when the correlated simulated traits had a heritability of 0.9. We therefore conclude that the implementation of state-of-the-art multivariate GWAS approaches is a useful tool for dissecting pleiotropy and their more widespread implementation could facilitate the discovery of genes and other biological mechanisms underlying maize inflorescence. 
    more » « less
  3. Abstract Classical genetic studies have identified many cases of pleiotropy where mutations in individual genes alter many different phenotypes. Quantitative genetic studies of natural genetic variants frequently examine one or a few traits, limiting their potential to identify pleiotropic effects of natural genetic variants. Widely adopted community association panels have been employed by plant genetics communities to study the genetic basis of naturally occurring phenotypic variation in a wide range of traits. High-density genetic marker data—18M markers—from 2 partially overlapping maize association panels comprising 1,014 unique genotypes grown in field trials across at least 7 US states and scored for 162 distinct trait data sets enabled the identification of of 2,154 suggestive marker-trait associations and 697 confident associations in the maize genome using a resampling-based genome-wide association strategy. The precision of individual marker-trait associations was estimated to be 3 genes based on a reference set of genes with known phenotypes. Examples were observed of both genetic loci associated with variation in diverse traits (e.g., above-ground and below-ground traits), as well as individual loci associated with the same or similar traits across diverse environments. Many significant signals are located near genes whose functions were previously entirely unknown or estimated purely via functional data on homologs. This study demonstrates the potential of mining community association panel data using new higher-density genetic marker sets combined with resampling-based genome-wide association tests to develop testable hypotheses about gene functions, identify potential pleiotropic effects of natural genetic variants, and study genotype-by-environment interaction. 
    more » « less
  4. Macdonald, S (Ed.)
    Abstract Quantitative genetics in Caenorhabditis elegans seeks to identify naturally segregating genetic variants that underlie complex traits. Genome-wide association studies scan the genome for individual genetic variants that are significantly correlated with phenotypic variation in a population, or quantitative trait loci. Genome-wide association studies are a popular choice for quantitative genetic analyses because the quantitative trait loci that are discovered segregate in natural populations. Despite numerous successful mapping experiments, the empirical performance of genome-wide association study has not, to date, been formally evaluated in C. elegans. We developed an open-source genome-wide association study pipeline called NemaScan and used a simulation-based approach to provide benchmarks of mapping performance in collections of wild C. elegans strains. Simulated trait heritability and complexity determined the spectrum of quantitative trait loci detected by genome-wide association studies. Power to detect smaller-effect quantitative trait loci increased with the number of strains sampled from the C. elegans Natural Diversity Resource. Population structure was a major driver of variation in mapping performance, with populations shaped by recent selection exhibiting significantly lower false discovery rates than populations composed of more divergent strains. We also recapitulated previous genome-wide association studies of experimentally validated quantitative trait variants. Our simulation-based evaluation of performance provides the community with critical context to pursue quantitative genetic studies using the C. elegans Natural Diversity Resource to elucidate the genetic basis of complex traits in C. elegans natural populations. 
    more » « less
  5. Abstract PremiseUnderstanding relationships among grass traits, fire, and herbivores may help improve conservation strategies for savannas that are threatened by novel disturbance regimes. Emerging theory, developed in Africa, emphasizes that functional traits of savanna grasses reflect the distinct ways that fire and grazers consume biomass. Specifically, functional trade‐offs related to flammability and palatability predict that highly flammable grass species will be unpalatable, while highly palatable species will impede fire. MethodsWe quantified six culm and leaf traits of 337 native grasses of Texas—a historical savanna region that has been transformed by fire exclusion, megafaunal extinctions, and domestic livestock. ResultsMultivariate analyses of traits revealed three functional strategies. “Grazer grasses” (N = 50) had culms that were short, narrow, and horizontal, and leaves with high width to length (W:L) and low C to N ratios (C:N)—trait values that attract grazers and avoid fire. “Fire grasses” (N = 104) had culms that were tall, thick, and upright, and leaves that were thick, with low W:L, and high C:N—trait values that promote fire and discourage grazers. “Generalist tolerators” and “generalist avoiders” (N = 183) had trait values that were intermediate to the other groups. ConclusionsOur findings confirm that the flammability–palatability trade‐offs that operate in Africa also explain correlated suites of traits in Texas grasses and highlights that the grass flora of Texas bears the signature of Pleistocene megafauna and the influence of fires that predate human arrival. We suggest that grass functional classifications based on fire and grazer traits can improve prescribed fire and livestock management of savannas of Texas and globally. 
    more » « less