skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 1, 2026

Title: Post‐selection inference for high‐dimensional mediation analysis with survival outcomes
ABSTRACT It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high‐dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post‐selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post‐selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach. To this end, we establish the asymptotic normality of a stabilized one‐step estimator that takes the selection of the mediator into account. Simulation studies show that our proposed method has good empirical performance. We further apply our proposed approach to a lung cancer dataset and find multiple DNA methylation CpG sites that might mediate the effect of cigarette smoking on lung cancer survival.  more » « less
Award ID(s):
2112938
PAR ID:
10627800
Author(s) / Creator(s):
; ;
Publisher / Repository:
John Wiley & Sons
Date Published:
Journal Name:
Scandinavian Journal of Statistics
Volume:
52
Issue:
2
ISSN:
0303-6898
Page Range / eLocation ID:
756 to 776
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this study, we focus on estimating the heterogeneous treatment effect (HTE) for survival outcome. The outcome is subject to censoring and the number of covariates is high-dimensional. We utilize data from both the randomized controlled trial (RCT), considered as the gold standard, and real-world data (RWD), possibly affected by hidden confounding factors. To achieve a more efficient HTE estimate, such integrative analysis requires great insight into the data generation mechanism, particularly the accurate characterization of unmeasured confounding effects/bias. With this aim, we propose a penalized-regression-based integrative approach that allows for the simultaneous estimation of parameters, selection of variables, and identification of the existence of unmeasured confounding effects. The consistency, asymptotic normality, and efficiency gains are rigorously established for the proposed estimate. Finally, we apply the proposed method to estimate the HTE of lobar/sublobar resection on the survival of lung cancer patients. The RCT is a multicenter non-inferiority randomized phase 3 trial, and the RWD comes from a clinical oncology cancer registry in the United States. The analysis reveals that the unmeasured confounding exists and the integrative approach does enhance the efficiency for the HTE estimation. 
    more » « less
  2. Yanwu, Xu (Ed.)
    Lung cancer is a major cause of cancer-related deaths, and early diagnosis and treatment are crucial for improving patients’ survival outcomes. In this paper, we propose to employ convolutional neural networks to model the non-linear relationship between the risk of lung cancer and the lungs’ morphology revealed in the CT images. We apply a mini-batched loss that extends the Cox proportional hazards model to handle the non-convexity induced by neural networks, which also enables the training of large data sets. Additionally, we propose to combine mini-batched loss and binary cross-entropy to predict both lung cancer occurrence and the risk of mortality. Simulation results demonstrate the effectiveness of both the mini-batched loss with and without the censoring mechanism, as well as its combination with binary cross-entropy. We evaluate our approach on the National Lung Screening Trial data set with several 3D convolutional neural network architectures, achieving high AUC and C-index scores for lung cancer classification and survival prediction. These results, obtained from simulations and real data experiments, highlight the potential of our approach to improving the diagnosis and treatment of lung cancer. 
    more » « less
  3. Abstract Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modelling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms. 
    more » « less
  4. The majority of lung cancer patients are diagnosed with metastatic disease. This study identified a set of 73 microRNAs (miRNAs) that classified lung cancer tumors from normal lung tissues with an overall accuracy of 96.3% in the training patient cohort (n = 109) and 91.7% in unsupervised classification and 92.3% in supervised classification in the validation set (n = 375). Based on association with patient survival (n = 1016), 10 miRNAs were identified as potential tumor suppressors (hsa-miR-144, hsa-miR-195, hsa-miR-223, hsa-miR-30a, hsa-miR-30b, hsa-miR-30d, hsa-miR-335, hsa-miR-363, hsa-miR-451, and hsa-miR-99a), and 4 were identified as potential oncogenes (hsa-miR-21, hsa-miR-31, hsa-miR-411, and hsa-miR-494) in lung cancer. Experimentally confirmed target genes were identified for the 73 diagnostic miRNAs, from which proliferation genes were selected from CRISPR-Cas9/RNA interference (RNAi) screening assays. Pansensitive and panresistant genes to 21 NCCN-recommended drugs with concordant mRNA and protein expression were identified. DGKE and WDR47 were found with significant associations with responses to both systemic therapies and radiotherapy in lung cancer. Based on our identified miRNA-regulated molecular machinery, an inhibitor of PDK1/Akt BX-912, an anthracycline antibiotic daunorubicin, and a multi-targeted protein kinase inhibitor midostaurin were discovered as potential repositioning drugs for treating lung cancer. These findings have implications for improving lung cancer diagnosis, optimizing treatment selection, and discovering new drug options for better patient outcomes. 
    more » « less
  5. There has been a growing interest in incorporating auxiliary summary information from external studies into the analysis of internal individual‐level data. In this paper, we propose an adaptive estimation procedure for an additive risk model to integrate auxiliary subgroup survival information via a penalized method of moments technique. Our approach can accommodate information from heterogeneous data. Parameters to quantify the magnitude of potential incomparability between internal data and external auxiliary information are introduced in our framework while nonzero components of these parameters suggest a violation of the homogeneity assumption. We further develop an efficient computational algorithm to solve the numerical optimization problem by profiling out the nuisance parameters. In an asymptotic sense, our method can be as efficient as if all the incomparable auxiliary information is accurately acknowledged and has been automatically excluded from consideration. The asymptotic normality of the proposed estimator of the regression coefficients is established, with an explicit formula for the asymptotic variance‐covariance matrix that can be consistently estimated from the data. Simulation studies show that the proposed method yields a substantial gain in statistical efficiency over the conventional method using the internal data only, and reduces estimation biases when the given auxiliary survival information is incomparable. We illustrate the proposed method with a lung cancer survival study. 
    more » « less