NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A unifying framework for joint trait analysis under a non-infinitesimal model

https://doi.org/10.1093/bioinformatics/bty254

Johnson, Ruth; Shi, Huwenbo; Pasaniuc, Bogdan; Sankararaman, Sriram (June 2018, Bioinformatics)

Abstract MotivationA large proportion of risk regions identified by genome-wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases. ResultsIn this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non-Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non-infinitesimal genetic architecture underlying each trait. We propose a Metropolis–Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and body mass index GWAS to show that it produces estimates consistent with the known genetic makeup of both traits. Availability and implementationThe UNITY software is made freely available to the research community at: https://github.com/bogdanlab/UNITY. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
A scalable estimator of SNP heritability for biobank-scale data

https://doi.org/10.1093/bioinformatics/bty253

Wu, Yue; Sankararaman, Sriram (June 2018, Bioinformatics)

Abstract MotivationHeritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. ResultsWe propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log⁡3N, log⁡3M)).We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. Availability and implementationThe RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.
more » « less
Efficient variance components analysis across millions of genomes

https://doi.org/10.1038/s41467-020-17576-9

Pazokitoroudi, Ali; Wu, Yue; Burch, Kathryn S.; Hou, Kangcheng; Zhou, Aaron; Pasaniuc, Bogdan; Sankararaman, Sriram (August 2020, Nature Communications)

Abstract While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.
more » « less
Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy

https://doi.org/10.1038/s41467-022-28553-9

Cinelli, Carlos; LaPierre, Nathan; Hill, Brian L.; Sankararaman, Sriram; Eskin, Eleazar (December 2022, Nature Communications)

Abstract Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.
more » « less
Full Text Available
Inferring population structure in biobank-scale genomic data

https://doi.org/10.1016/j.ajhg.2022.02.015

Chiu, Alec M.; Molloy, Erin K.; Tan, Zilong; Talwalkar, Ameet; Sankararaman, Sriram (April 2022, The American Journal of Human Genetics)

Full Text Available
Evaluating supervised and unsupervised background noise correction in human gut microbiome data

https://doi.org/10.1371/journal.pcbi.1009838

Briscoe, Leah; Balliu, Brunilda; Sankararaman, Sriram; Halperin, Eran; Garud, Nandita R. (February 2022, PLOS Computational Biology)
Segata, Nicola (Ed.)
The ability to predict human phenotypes and identify biomarkers of disease from metagenomic data is crucial for the development of therapeutics for microbiome-associated diseases. However, metagenomic data is commonly affected by technical variables unrelated to the phenotype of interest, such as sequencing protocol, which can make it difficult to predict phenotype and find biomarkers of disease. Supervised methods to correct for background noise, originally designed for gene expression and RNA-seq data, are commonly applied to microbiome data but may be limited because they cannot account for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach that is presently used in other domains but has not been applied to microbiome data to date. We find that the unsupervised principal component correction approach has comparable ability in reducing false discovery of biomarkers as the supervised approaches, with the added benefit of not needing to know the sources of variation apriori. However, in prediction tasks, it appears to only improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses.
more » « less
Full Text Available
STENSL: Microbial Source Tracking with ENvironment SeLection

Ulzee An, Liat Shenhav (January 2022, mSystems)

Microbial source tracking analysis has emerged as a widespread technique for characterizing the properties of complex microbial communities. However, this analysis is currently limited to source environments sampled in a specific study. In order to expand the scope beyond one single study and allow the exploration of source environments using large databases and repositories, such as the Earth Microbiome Project, a source selection procedure is required. Such a procedure will allow differentiating between contributing environments and nuisance ones when the number of potential sources considered is high. Here, we introduce STENSL (microbial Source Tracking with ENvironment SeLection), a machine learning method that extends common microbial source tracking analysis by performing an unsupervised source selection and enabling sparse identification of latent source environments. By incorporating sparsity into the estimation of potential source environments, STENSL improves the accuracy of true source contribution, while significantly reducing the noise introduced by noncontributing ones. We therefore anticipate that source selection will augment microbial source tracking analyses, enabling exploration of multiple source environments from publicly available repositories while maintaining high accuracy of the statistical inference.
more » « less
Full Text Available
Fast estimation of genetic correlation for biobank-scale data

https://doi.org/10.1016/j.ajhg.2021.11.015

Wu, Yue; Burch, Kathryn S.; Ganna, Andrea; Pajukanta, Päivi; Pasaniuc, Bogdan; Sankararaman, Sriram (January 2022, The American Journal of Human Genetics)

Full Text Available
Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits

https://doi.org/10.1371/journal.pcbi.1009483

Johnson, Ruth; Burch, Kathryn S.; Hou, Kangcheng; Paciuc, Mario; Pasaniuc, Bogdan; Sankararaman, Sriram (October 2021, PLOS Computational Biology)
Wheeler, Heather E. (Ed.)
The number of variants that have a non-zero effect on a trait ( i.e . polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions ( N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.
more » « less
Full Text Available
Functional dynamic genetic effects on gene regulation are specific to particular cell types and environmental conditions

https://doi.org/https://doi.org/10.7554/elife.67077

Findley, A.S. (May 2021, eLife)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records