skip to main content

Title: Propensity Score Modeling in Electronic Health Records with Time-to-Event Endpoints: Application to Kidney Transplantation
For large observational studies lacking a control group (unlike randomized controlled trials, RCT), propensity scores (PS) are often the method of choice to account for pre-treatment confounding in baseline characteristics, and thereby avoid substantial bias in treatment estimation. A vast majority of PS techniques focus on average treatment effect estimation, without any clear consensus on how to account for confounders, especially in a multiple treatment setting. Furthermore, for time-to event outcomes, the analytical framework is further complicated in presence of high censoring rates (sometimes, due to non-susceptibility of study units to a disease), imbalance between treatment groups, and clustered nature of the data (where, survival outcomes appear in groups). Motivated by a right-censored kidney transplantation dataset derived from the United Network of Organ Sharing (UNOS), we investigate and compare two recent promising PS procedures, (a) the generalized boosted model (GBM), and (b) the covariate-balancing propensity score (CBPS), in an attempt to decouple the causal effects of treatments (here, study subgroups, such as hepatitis C virus (HCV) positive/negative donors, and positive/negative recipients) on time to death of kidney recipients due to kidney failure, post transplantation. For estimation, we employ a 2-step procedure which addresses various complexities observed in the UNOS database within a unified paradigm. First, to adjust for the large number of confounders on the multiple sub-groups, we fit multinomial PS models via procedures (a) and (b). In the next stage, the estimated PS is incorporated into the likelihood of a semi-parametric cure rate Cox proportional hazard frailty model via inverse probability of treatment weighting, adjusted for multi-center clustering and excess censoring, Our data analysis reveals a more informative and superior performance of the full model in terms of treatment effect estimation, over sub-models that relaxes the various features of the event time dataset.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of Data Science
Page Range / eLocation ID:
188 to 208
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the post‐treatment outcome of interest is whether the event happens within a pre‐specified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity score‐based methods. An additional source of bias is right‐censoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regression‐based estimator that can simultaneously handle both confounding and right‐censoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.

    more » « less
  2. Abstract

    Marginal structural models (MSMs) can be used to estimate the causal effect of a potentially time-varying treatment in the presence of time-dependent confounding via weighted regression. The standard approach of using inverse probability of treatment weighting (IPTW) can be sensitive to model misspecification and lead to high-variance estimates due to extreme weights. Various methods have been proposed to partially address this, including covariate balancing propensity score (CBPS) to mitigate treatment model misspecification, and truncation and stabilized-IPTW (sIPTW) to temper extreme weights. In this article, we present kernel optimal weighting (KOW), a convex-optimization-based approach that finds weights for fitting the MSMs that flexibly balance time-dependent confounders while simultaneously penalizing extreme weights, directly addressing the above limitations. We further extend KOW to control for informative censoring. We evaluate the performance of KOW in a simulation study, comparing it with IPTW, sIPTW, and CBPS. We demonstrate the use of KOW in studying the effect of treatment initiation on time-to-death among people living with human immunodeficiency virus and the effect of negative advertising on elections in the United States.

    more » « less
  3. Abstract STUDY QUESTION

    To what extent is preconception maternal or paternal coronavirus disease 2019 (COVID-19) vaccination associated with miscarriage incidence?


    COVID-19 vaccination in either partner at any time before conception is not associated with an increased rate of miscarriage.


    Several observational studies have evaluated the safety of COVID-19 vaccination during pregnancy and found no association with miscarriage, though no study prospectively evaluated the risk of early miscarriage (gestational weeks [GW] <8) in relation to COVID-19 vaccination. Moreover, no study has evaluated the role of preconception vaccination in both male and female partners.


    An Internet-based, prospective preconception cohort study of couples residing in the USA and Canada. We analyzed data from 1815 female participants who conceived during December 2020–November 2022, including 1570 couples with data on male partner vaccination.


    Eligible female participants were aged 21–45 years and were trying to conceive without use of fertility treatment at enrollment. Female participants completed questionnaires at baseline, every 8 weeks until pregnancy, and during early and late pregnancy; they could also invite their male partners to complete a baseline questionnaire. We collected data on COVID-19 vaccination (brand and date of doses), history of SARS-CoV-2 infection (yes/no and date of positive test), potential confounders (demographic, reproductive, and lifestyle characteristics), and pregnancy status on all questionnaires. Vaccination status was categorized as never (0 doses before conception), ever (≥1 dose before conception), having a full primary sequence before conception, and completing the full primary sequence ≤3 months before conception. These categories were not mutually exclusive. Participants were followed up from their first positive pregnancy test until miscarriage or a censoring event (induced abortion, ectopic pregnancy, loss to follow-up, 20 weeks’ gestation), whichever occurred first. We estimated incidence rate ratios (IRRs) for miscarriage and corresponding 95% CIs using Cox proportional hazards models with GW as the time scale. We used propensity score fine stratification weights to adjust for confounding.


    Among 1815 eligible female participants, 75% had received at least one dose of a COVID-19 vaccine by the time of conception. Almost one-quarter of pregnancies resulted in miscarriage, and 75% of miscarriages occurred <8 weeks’ gestation. The propensity score-weighted IRR comparing female participants who received at least one dose any time before conception versus those who had not been vaccinated was 0.85 (95% CI: 0.63, 1.14). COVID-19 vaccination was not associated with increased risk of either early miscarriage (GW: <8) or late miscarriage (GW: 8–19). There was no indication of an increased risk of miscarriage associated with male partner vaccination (IRR = 0.90; 95% CI: 0.56, 1.44).


    The present study relied on self-reported vaccination status and infection history. Thus, there may be some non-differential misclassification of exposure status. While misclassification of miscarriage is also possible, the preconception cohort design and high prevalence of home pregnancy testing in this cohort reduced the potential for under-ascertainment of miscarriage. As in all observational studies, residual or unmeasured confounding is possible.


    This is the first study to evaluate prospectively the relation between preconception COVID-19 vaccination in both partners and miscarriage, with more complete ascertainment of early miscarriages than earlier studies of vaccination. The findings are informative for individuals planning a pregnancy and their healthcare providers.


    This work was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Institute of Health [R01-HD086742 (PI: L.A.W.); R01-HD105863S1 (PI: L.A.W. and M.L.E.)], the National Institute of Allergy and Infectious Diseases (R03-AI154544; PI: A.K.R.), and the National Science Foundation (NSF-1914792; PI: L.A.W.). The funders had no role in the study design, data collection, analysis and interpretation of data, writing of the report, or the decision to submit the paper for publication. L.A.W. is a fibroid consultant for AbbVie, Inc. She also receives in-kind donations from Swiss Precision Diagnostics (Clearblue home pregnancy tests) and (fertility apps). M.L.E. received consulting fees from Ro, Hannah, Dadi, VSeat, and Underdog, holds stock in Ro, Hannah, Dadi, and Underdog, is a past president of SSMR, and is a board member of SMRU. K.F.H. reports being an investigator on grants to her institution from UCB and Takeda, unrelated to this study. S.H.-D. reports being an investigator on grants to her institution from Takeda, unrelated to this study, and a methods consultant for UCB and Roche for unrelated drugs. The authors report no other relationships or activities that could appear to have influenced the submitted work.



    more » « less
  4. Abstract Propensity score weighting is a tool for causal inference to adjust for measured confounders in observational studies. In practice, data often present complex structures, such as clustering, which make propensity score modeling and estimation challenging. In addition, for clustered data, there may be unmeasured cluster-level covariates that are related to both the treatment assignment and outcome. When such unmeasured cluster-specific confounders exist and are omitted in the propensity score model, the subsequent propensity score adjustment may be biased. In this article, we propose a calibration technique for propensity score estimation under the latent ignorable treatment assignment mechanism, i. e., the treatment-outcome relationship is unconfounded given the observed covariates and the latent cluster-specific confounders. We impose novel balance constraints which imply exact balance of the observed confounders and the unobserved cluster-level confounders between the treatment groups. We show that the proposed calibrated propensity score weighting estimator is doubly robust in that it is consistent for the average treatment effect if either the propensity score model is correctly specified or the outcome follows a linear mixed effects model. Moreover, the proposed weighting method can be combined with sampling weights for an integrated solution to handle confounding and sampling designs for causal inference with clustered survey data. In simulation studies, we show that the proposed estimator is superior to other competitors. We estimate the effect of School Body Mass Index Screening on prevalence of overweight and obesity for elementary schools in Pennsylvania. 
    more » « less
  5. Abstract

    The noniterative conditional expectation (NICE) parametric g-formula can be used to estimate the causal effect of sustained treatment strategies. In addition to identifiability conditions, the validity of the NICE parametric g-formula generally requires the correct specification of models for time-varying outcomes, treatments, and confounders at each follow-up time point. An informal approach for evaluating model specification is to compare the observed distributions of the outcome, treatments, and confounders with their parametric g-formula estimates under the “natural course.” In the presence of loss to follow-up, however, the observed and natural-course risks can differ even if the identifiability conditions of the parametric g-formula hold and there is no model misspecification. Here, we describe 2 approaches for evaluating model specification when using the parametric g-formula in the presence of censoring: 1) comparing factual risks estimated by the g-formula with nonparametric Kaplan-Meier estimates and 2) comparing natural-course risks estimated by inverse probability weighting with those estimated by the g-formula. We also describe how to correctly compute natural-course estimates of time-varying covariate means when using a computationally efficient g-formula algorithm. We evaluate the proposed methods via simulation and implement them to estimate the effects of dietary interventions in 2 cohort studies.

    more » « less