Recent advances in genotyping with high‐density markers allow researchers access to genomic variants including rare ones. Linkage disequilibrium (LD) is widely used to provide insight into evolutionary history. It is also the basis for association mapping in humans and other species. Better understanding of the genomic LD structure may lead to better‐informed statistical tests that can improve the power of association studies. Although rare variant associations with common diseases (RVCD) have been extensively studied recently, there is very limited understanding, and even controversial view of LD structures among rare variants and between rare and common variants. In fact, many popular RVCD tests make the assumptions that rare variants are independent. In this report, we show that two commonly used LD measures are not capable of detecting LD when rare variants are involved. We present this argument from two perspectives, both the LD measures themselves and the computational issues associated with them. To address these issues, we propose an alternative LD measure, the polychoric correlation, that was originally designed for detecting associations among categorical variables. Using simulated as well as the 1000 Genomes data, we explore the performances of LD measures in detail and discuss their implications in association studies.
Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants.
To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans.
CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
- Award ID(s):
- 1715202
- PAR ID:
- 10471989
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- BMC Bioinformatics
- Volume:
- 24
- Issue:
- 1
- ISSN:
- 1471-2105
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
ABSTRACT -
Abstract In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on
KnockoffZoom : a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we applyKnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings. -
Abstract As human complex diseases are influenced by the interaction between genetics and the environment, identifying gene–environment interactions (G×E) is crucial for understanding disease mechanisms and predicting risk. Developing robust quantitative tools for G×E analysis can enhance the study of complex diseases. However, many existing methods that explore G×E focus on the interplay between an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we developed MAGEIT_RAN and MAGEIT_FIX to identify interactions between an environmental factor and a set of genetic markers, including both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random and fixed effects, respectively. Simulation studies showed that both tests had type I error under control, with MAGEIT_RAN being the most powerful test. Applying MAGEIT to a genome-wide analysis of gene–alcohol interactions on hypertension and seated systolic blood pressure in the Multiethnic Study of Atherosclerosis revealed genes like EIF2AK2, CCNDBP1, and EPB42 influencing blood pressure through alcohol interaction. Pathway analysis identified 1 apoptosis and survival pathway involving PKR and 2 signal transduction pathways associated with hypertension and alcohol intake, demonstrating MAGEIT_RAN's ability to detect biologically relevant gene–environment interactions.
-
Zeggini, Eleftheria (Ed.)Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).more » « less
-
Abstract Aim As within‐species genomic data have been shown useful in interpreting broader biogeographic trends, we analysed the mode of population genomic isolation involved in a well‐studied intertidal genomic cline to better understand the mechanisms maintaining it. These results were interpreted in the context of spatial variation in habitat use and availability as well as likely fitness consequences for hybridization between the two lineages.
Location Pacific coast of North America.
Taxon Arthropods (Class Maxillopoda, Order Sessilia, Family Balanidae;
Balanus glandula ).Methods Genotype‐by‐sequencing approaches were used to generate single‐nucleotide polymorphism markers across sites sampled between southern Alaska and Southern California. Inference using standard population genomic methods, including analysis of population structure, inbreeding and linkage disequilibrium, was used to identify the steepest transitions across the largest number of loci examined. These data were put in the context of observed population density and habitat availability.
Results We show that the majority of markers analysed show strong clinal transitions in a very narrow portion of the California coast. Patterns of linkage disequilibrium among markers, along with prior evidence of variation in reproductive potential by latitude and by mitochondrial lineage, suggest some reproductive isolation among the northern and southern lineages of
B. glandula that are concordant with the drop in population density and habitat availability in central California.Main Conclusions A significant clinal transition in genomic diversity is stronger and more localized than previously recognized and exhibits statistical patterns suggesting that the lineages are reproductively and phenotypically distinct in ways that may be ecologically important. As this species has been used to infer process in coastal biogeography, further study of concordant patterns will be important for advancing our understanding of this region.