skip to main content

Search for: All records

Award ID contains: 1712933

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, (: effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large‐scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure‐mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) ; (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the threepvalues obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel‐comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi‐Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R packagemedScanavailable on the CRAN for implementing all the six methods.

    more » « less
  2. Robinson, Peter (Ed.)
    Abstract Motivation

    Identifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today.


    Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors.

    Availabilityand implementation

    Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at All R scripts used in this study are also available at this site.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  3. Background

    Genome‐wide association studies (GWASs) have identified thousands of genetic variants that are associated with many complex traits. However, their biological mechanisms remain largely unknown. Transcriptome‐wide association studies (TWAS) have been recently proposed as an invaluable tool for investigating the potential gene regulatory mechanisms underlying variant‐trait associations. Specifically, TWAS integrate GWAS with expression mapping studies based on a common set of variants and aim to identify genes whose GReX is associated with the phenotype. Various methods have been developed for performing TWAS and/or similar integrative analysis. Each such method has a different modeling assumption and many were initially developed to answer different biological questions. Consequently, it is not straightforward to understand their modeling property from a theoretical perspective.


    We present a technical review on thirteen TWAS methods. Importantly, we show that these methods can all be viewed as two‐sample Mendelian randomization (MR) analysis, which has been widely applied in GWASs for examining the causal effects of exposure on outcome. Viewing different TWAS methods from an MR perspective provides us a unique angle for understanding their benefits and pitfalls. We systematically introduce the MR analysis framework, explain how features of the GWAS and expression data influence the adaptation of MR for TWAS, and re‐interpret the modeling assumptions made in different TWAS methods from an MR angle. We finally describe future directions for TWAS methodology development.


    We hope that this review would serve as a useful reference for both methodologists who develop TWAS methods and practitioners who perform TWAS analysis.

    more » « less
  4. Abstract

    Spatial transcriptomic studies are becoming increasingly common and large, posing important statistical and computational challenges for many analytic tasks. Here, we present SPARK-X, a non-parametric method for rapid and effective detection of spatially expressed genes in large spatial transcriptomic studies. SPARK-X not only produces effective type I error control and high power but also brings orders of magnitude computational savings. We apply SPARK-X to analyze three large datasets, one of which is only analyzable by SPARK-X. In these data, SPARK-X identifies many spatially expressed genes including those that are spatially expressed within the same cell type, revealing new biological insights.

    more » « less
  5. Abstract

    Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modelling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.

    more » « less
  6. Abstract

    Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs. Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.

    more » « less
  7. Abstract

    Integrating results from genome-wide association studies (GWASs) and gene expression studies through transcriptome-wide association study (TWAS) has the potential to shed light on the causal molecular mechanisms underlying disease etiology. Here, we present a probabilistic Mendelian randomization (MR) method, PMR-Egger, for TWAS applications. PMR-Egger relies on a MR likelihood framework that unifies many existing TWAS and MR methods, accommodates multiple correlated instruments, tests the causal effect of gene on trait in the presence of horizontal pleiotropy, and is scalable to hundreds of thousands of individuals. In simulations, PMR-Egger provides calibrated type I error control for causal effect testing in the presence of horizontal pleiotropic effects, is reasonably robust under various types of model misspecifications, is more powerful than existing TWAS/MR approaches, and can directly test for horizontal pleiotropy. We illustrate the benefits of PMR-Egger in applications to 39 diseases and complex traits obtained from three GWASs including the UK Biobank.

    more » « less
  8. Abstract

    Causal mediation analysis aims to examine the role of a mediator or a group of mediators that lie in the pathway between an exposure and an outcome. Recent biomedical studies often involve a large number of potential mediators based on high‐throughput technologies. Most of the current analytic methods focus on settings with one or a moderate number of potential mediators. With the expanding growth of ‐omics data, joint analysis of molecular‐level genomics data with epidemiological data through mediation analysis is becoming more common. However, such joint analysis requires methods that can simultaneously accommodate high‐dimensional mediators and that are currently lacking. To address this problem, we develop a Bayesian inference method using continuous shrinkage priors to extend previous causal mediation analysis techniques to a high‐dimensional setting. Simulations demonstrate that our method improves the power of global mediation analysis compared to simpler alternatives and has decent performance to identify true nonnull contributions to the mediation effects of the pathway. The Bayesian method also helps us to understand the structure of the composite null cases for inactive mediators in the pathway. We applied our method to Multi‐Ethnic Study of Atherosclerosis and identified DNA methylation regions that may actively mediate the effect of socioeconomic status on cardiometabolic outcomes.

    more » « less
  9. Kutalik, Zoltán (Ed.)

    Epigenetic researchers often evaluate DNA methylation as a potential mediator of the effect of social/environmental exposures on a health outcome. Modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large multi-ethnic cohort in the United States, while providing an R package for their seamless implementation and adoption. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model (BSLMM) and high-dimensional mediation analysis (HDMA); while the preferred methods for estimating the global mediation effect are high-dimensional linear mediation analysis (HILMA) and principal component mediation analysis (PCMA). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.

    more » « less
    Free, publicly-accessible full text available November 7, 2024