This article proposes a novel causal discovery and inference method called GrIVET for a Gaussian directed acyclic graph with unmeasured confounders. GrIVET consists of an order-based causal discovery method and a likelihood-based inferential procedure. For causal discovery, we generalize the existing peeling algorithm to estimate the ancestral relations and candidate instruments in the presence of hidden confounders. Based on this, we propose a new procedure for instrumental variable estimation of each direct effect by separating it from any mediation effects. For inference, we develop a new likelihood ratio test of multiple causal effects that is able to account for the unmeasured confounders. Theoretically, we prove that the proposed method has desirable guarantees, including robustness to invalid instruments and uncertain interventions, estimation consistency, low-order polynomial time complexity, and validity of asymptotic inference. Numerically, GrIVET performs well and compares favorably against state-of-the-art competitors. Furthermore, we demonstrate the utility and effectiveness of the proposed method through an application inferring regulatory pathways from Alzheimer’s disease gene expression data.
more »
« less
High-Dimensional Inference for Generalized Linear Models with Hidden Confounding
Statistical inferences for high-dimensional regression models have been extensively studied for their wide applications ranging from genomics, neuroscience, to economics. However, in practice, there are often potential unmeasured confounders associated with both the response and covariates, which can lead to invalidity of standard debiasing methods. This paper focuses on a generalized linear regression framework with hidden confounding and proposes a debiasing approach to address this high-dimensional problem, by adjusting for the effects induced by the unmeasured confounders. We establish consistency and asymp- totic normality for the proposed debiased estimator. The finite sample performance of the proposed method is demonstrated through extensive numerical studies and an application to a genetic data set.
more »
« less
- PAR ID:
- 10505213
- Publisher / Repository:
- Microtome Publishing
- Date Published:
- Journal Name:
- Journal of Machine Learning Research
- ISSN:
- 1532-4435
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.more » « less
-
Abstract Propensity score weighting is a tool for causal inference to adjust for measured confounders in observational studies. In practice, data often present complex structures, such as clustering, which make propensity score modeling and estimation challenging. In addition, for clustered data, there may be unmeasured cluster-level covariates that are related to both the treatment assignment and outcome. When such unmeasured cluster-specific confounders exist and are omitted in the propensity score model, the subsequent propensity score adjustment may be biased. In this article, we propose a calibration technique for propensity score estimation under the latent ignorable treatment assignment mechanism, i. e., the treatment-outcome relationship is unconfounded given the observed covariates and the latent cluster-specific confounders. We impose novel balance constraints which imply exact balance of the observed confounders and the unobserved cluster-level confounders between the treatment groups. We show that the proposed calibrated propensity score weighting estimator is doubly robust in that it is consistent for the average treatment effect if either the propensity score model is correctly specified or the outcome follows a linear mixed effects model. Moreover, the proposed weighting method can be combined with sampling weights for an integrated solution to handle confounding and sampling designs for causal inference with clustered survey data. In simulation studies, we show that the proposed estimator is superior to other competitors. We estimate the effect of School Body Mass Index Screening on prevalence of overweight and obesity for elementary schools in Pennsylvania.more » « less
-
High‐dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast‐based ‐penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and value of the individual hypothesis test. We also examine cases of model misspecification and non‐identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.more » « less
-
Abstract In this study, we focus on estimating the heterogeneous treatment effect (HTE) for survival outcome. The outcome is subject to censoring and the number of covariates is high-dimensional. We utilize data from both the randomized controlled trial (RCT), considered as the gold standard, and real-world data (RWD), possibly affected by hidden confounding factors. To achieve a more efficient HTE estimate, such integrative analysis requires great insight into the data generation mechanism, particularly the accurate characterization of unmeasured confounding effects/bias. With this aim, we propose a penalized-regression-based integrative approach that allows for the simultaneous estimation of parameters, selection of variables, and identification of the existence of unmeasured confounding effects. The consistency, asymptotic normality, and efficiency gains are rigorously established for the proposed estimate. Finally, we apply the proposed method to estimate the HTE of lobar/sublobar resection on the survival of lung cancer patients. The RCT is a multicenter non-inferiority randomized phase 3 trial, and the RWD comes from a clinical oncology cancer registry in the United States. The analysis reveals that the unmeasured confounding exists and the integrative approach does enhance the efficiency for the HTE estimation.more » « less
An official website of the United States government

