skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Structure-informed clustering for population stratification in association studies
Abstract BackgroundIdentifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. ResultsTo overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. ConclusionsCluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.  more » « less
Award ID(s):
1715202
PAR ID:
10471989
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Bioinformatics
Volume:
24
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Zeggini, Eleftheria (Ed.)
    Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL). 
    more » « less
  2. Abstract In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report onKnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we applyKnockoffZoomto data from 350k subjects in the UK Biobank and report many new findings. 
    more » « less
  3. Abstract AimAs within‐species genomic data have been shown useful in interpreting broader biogeographic trends, we analysed the mode of population genomic isolation involved in a well‐studied intertidal genomic cline to better understand the mechanisms maintaining it. These results were interpreted in the context of spatial variation in habitat use and availability as well as likely fitness consequences for hybridization between the two lineages. LocationPacific coast of North America. TaxonArthropods (Class Maxillopoda, Order Sessilia, Family Balanidae;Balanus glandula). MethodsGenotype‐by‐sequencing approaches were used to generate single‐nucleotide polymorphism markers across sites sampled between southern Alaska and Southern California. Inference using standard population genomic methods, including analysis of population structure, inbreeding and linkage disequilibrium, was used to identify the steepest transitions across the largest number of loci examined. These data were put in the context of observed population density and habitat availability. ResultsWe show that the majority of markers analysed show strong clinal transitions in a very narrow portion of the California coast. Patterns of linkage disequilibrium among markers, along with prior evidence of variation in reproductive potential by latitude and by mitochondrial lineage, suggest some reproductive isolation among the northern and southern lineages ofB. glandulathat are concordant with the drop in population density and habitat availability in central California. Main ConclusionsA significant clinal transition in genomic diversity is stronger and more localized than previously recognized and exhibits statistical patterns suggesting that the lineages are reproductively and phenotypically distinct in ways that may be ecologically important. As this species has been used to infer process in coastal biogeography, further study of concordant patterns will be important for advancing our understanding of this region. 
    more » « less
  4. Abstract BackgroundUncovering the functional relevance underlying verbal declarative memory (VDM) genome-wide association study (GWAS) results may facilitate the development of interventions to reduce age-related memory decline and dementia. MethodsWe performed multi-omics and pathway enrichment analyses of paragraph (PAR-dr) and word list (WL-dr) delayed recall GWAS from 29,076 older non-demented individuals of European descent. We assessed the relationship between single-variant associations and expression quantitative trait loci (eQTLs) in 44 tissues and methylation quantitative trait loci (meQTLs) in the hippocampus. We determined the relationship between gene associations and transcript levels in 53 tissues, annotation as immune genes, and regulation by transcription factors (TFs) and microRNAs. To identify significant pathways, gene set enrichment was tested in each cohort and meta-analyzed across cohorts. Analyses of differential expression in brain tissues were conducted for pathway component genes. ResultsThe single-variant associations of VDM showed significant linkage disequilibrium (LD) with eQTLs across all tissues and meQTLs within the hippocampus. Stronger WL-dr gene associations correlated with reduced expression in four brain tissues, including the hippocampus. More robust PAR-dr and/or WL-dr gene associations were intricately linked with immunity and were influenced by 31 TFs and 2 microRNAs. Six pathways, including type I diabetes, exhibited significant associations with both PAR-dr and WL-dr. These pathways included fifteen MHC genes intricately linked to VDM performance, showing diverse expression patterns based on cognitive status in brain tissues. ConclusionsVDM genetic associations influence expression regulation via eQTLs and meQTLs. The involvement of TFs, microRNAs, MHC genes, and immune-related pathways contributes to VDM performance in older individuals. 
    more » « less
  5. Abstract The genomic variation of an invasive species may be affected by complex demographic histories and evolutionary changes during the invasion. Here, we describe the relative influence of bottlenecks, clonality, and population expansion in determining genomic variability of the widespread red macroalgaAgarophyton vermiculophyllum. Its introduction from mainland Japan to the estuaries of North America and Europe coincided with shifts from predominantly sexual to partially clonal reproduction and rapid adaptive evolution. A survey of 62,285 SNPs for 351 individuals from 35 populations, aligned to 24 chromosome‐length scaffolds indicate that linkage disequilibrium (LD), observed heterozygosity (Ho), Tajima's D, and nucleotide diversity (Pi) were greater among non‐native than native populations. Evolutionary simulations indicate LD and Tajima's D were consistent with a severe population bottleneck. Also, the increased rate of clonal reproduction in the non‐native range could not have produced the observed patterns by itself but may have magnified the bottleneck effect on LD. Elevated marker diversity in the genetic source populations could have contributed to the increasedHoand Pi observed in the non‐native range. We refined the previous invasion source region to a ~50 km section of northeastern Honshu Island. Outlier detection methods failed to reveal any consistently differentiated loci shared among invaded regions, probably because of the complexA. vermiculophyllumdemographic history. Our results reinforce the importance of demographic history, specifically founder effects, in driving genomic variation of invasive populations, even when localized adaptive evolution and reproductive system shifts are observed. 
    more » « less