skip to main content


Title: Evaluating Model Specification When Using the Parametric G-Formula in the Presence of Censoring
Abstract

The noniterative conditional expectation (NICE) parametric g-formula can be used to estimate the causal effect of sustained treatment strategies. In addition to identifiability conditions, the validity of the NICE parametric g-formula generally requires the correct specification of models for time-varying outcomes, treatments, and confounders at each follow-up time point. An informal approach for evaluating model specification is to compare the observed distributions of the outcome, treatments, and confounders with their parametric g-formula estimates under the “natural course.” In the presence of loss to follow-up, however, the observed and natural-course risks can differ even if the identifiability conditions of the parametric g-formula hold and there is no model misspecification. Here, we describe 2 approaches for evaluating model specification when using the parametric g-formula in the presence of censoring: 1) comparing factual risks estimated by the g-formula with nonparametric Kaplan-Meier estimates and 2) comparing natural-course risks estimated by inverse probability weighting with those estimated by the g-formula. We also describe how to correctly compute natural-course estimates of time-varying covariate means when using a computationally efficient g-formula algorithm. We evaluate the proposed methods via simulation and implement them to estimate the effects of dietary interventions in 2 cohort studies.

 
more » « less
NSF-PAR ID:
10502190
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
American Journal of Epidemiology
Volume:
192
Issue:
11
ISSN:
0002-9262
Format(s):
Medium: X Size: p. 1887-1895
Size(s):
["p. 1887-1895"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the post‐treatment outcome of interest is whether the event happens within a pre‐specified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity score‐based methods. An additional source of bias is right‐censoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regression‐based estimator that can simultaneously handle both confounding and right‐censoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.

     
    more » « less
  2. Summary

    Cancer is a major public health burden and is the second leading cause of death in the USA. The US National Cancer Institute estimated overall costs of cancer in 2007 at $219.2 billion. Breast cancer has the highest cancer incidence rates among women and is the second leading cause of cancer death among women. The ‘Surveillance, epidemiology, and end results’ programme of the National Cancer Institute collects and publishes cancer survival data from 17 population-based cancer registries. The CANSURV software of the National Cancer Institute analyses cancer survival data from the programme by using parametric and semiparametric mixture cure models. Another popular approach in cancer survival is the competing risks approach which considers the simultaneous risks from cancer and various other causes. The paper develops a model that unifies the mixture cure and competing risks approaches and that can handle the masked causes of death in a natural way. Markov chain sampling is used for Bayesian analysis of this model, and modelling and computational issues of general and restricted structures are discussed. The various model structures are compared by using Bayes factors. This Bayesian model is used to analyse survival data for the approximately 620000 breast cancer cases from the programme. The estimated cumulative probabilities of death from breast cancer from the proposed mixture cure competing risks model is found to be lower than the estimates that are obtained from the CANSURV software. Whereas the estimate of the cure fraction is found to be dependent on the modelling assumptions, the survival and cumulative probability estimates are not sensitive to these assumptions. Breast cancer survival in different ethnic subgroups, in different age subgroups and in patients with localized, regional and distant stages of the disease are compared. The risk of mortality from breast cancer is found to be the dominant cause of death in the beginning part of the follow-up whereas the risk from other competing causes often became the dominant cause in the latter part. This interrelation between breast cancer and other competing risks varies among the different ethnic groups, the different stages and the different age groups.

     
    more » « less
  3. Multi-agent dynamical systems refer to scenarios where multiple units (aka agents) interact with each other and evolve collectively over time. For instance, people’s health conditions are mutually influenced. Receiving vaccinations not only strengthens the longterm health status of one unit but also provides protection for those in their immediate surroundings. To make informed decisions in multi-agent dynamical systems, such as determining the optimal vaccine distribution plan, it is essential for decision-makers to estimate the continuous-time counterfactual outcomes. However, existing studies of causal inference over time rely on the assumption that units are mutually independent, which is not valid for multi-agent dynamical systems. In this paper, we aim to bridge this gap and study how to estimate counterfactual outcomes in multi-agent dynamical systems. Causal inference in a multi-agent dynamical system has unique challenges: 1) Confounders are timevarying and are present in both individual unit covariates and those of other units; 2) Units are affected by not only their own but also others’ treatments; 3) The treatments are naturally dynamic, such as receiving vaccines and boosters in a seasonal manner. To this end, we model a multi-agent dynamical system as a graph and propose a novel model called CF-GODE (CounterFactual Graph Ordinary Differential Equations). CF-GODE is a causal model that estimates continuous-time counterfactual outcomes in the presence of inter-dependencies between units. To facilitate continuous-time estimation,we propose Treatment-Induced GraphODE, a novel ordinary differential equation based on graph neural networks (GNNs), which can incorporate dynamical treatments as additional inputs to predict potential outcomes over time. To remove confounding bias, we propose two domain adversarial learning based objectives that learn balanced continuous representation trajectories, which are not predictive of treatments and interference. We further provide theoretical justification to prove their effectiveness. Experiments on two semi-synthetic datasets confirm that CF-GODE outperforms baselines on counterfactual estimation. We also provide extensive analyses to understand how our model works. 
    more » « less
  4. For large observational studies lacking a control group (unlike randomized controlled trials, RCT), propensity scores (PS) are often the method of choice to account for pre-treatment confounding in baseline characteristics, and thereby avoid substantial bias in treatment estimation. A vast majority of PS techniques focus on average treatment effect estimation, without any clear consensus on how to account for confounders, especially in a multiple treatment setting. Furthermore, for time-to event outcomes, the analytical framework is further complicated in presence of high censoring rates (sometimes, due to non-susceptibility of study units to a disease), imbalance between treatment groups, and clustered nature of the data (where, survival outcomes appear in groups). Motivated by a right-censored kidney transplantation dataset derived from the United Network of Organ Sharing (UNOS), we investigate and compare two recent promising PS procedures, (a) the generalized boosted model (GBM), and (b) the covariate-balancing propensity score (CBPS), in an attempt to decouple the causal effects of treatments (here, study subgroups, such as hepatitis C virus (HCV) positive/negative donors, and positive/negative recipients) on time to death of kidney recipients due to kidney failure, post transplantation. For estimation, we employ a 2-step procedure which addresses various complexities observed in the UNOS database within a unified paradigm. First, to adjust for the large number of confounders on the multiple sub-groups, we fit multinomial PS models via procedures (a) and (b). In the next stage, the estimated PS is incorporated into the likelihood of a semi-parametric cure rate Cox proportional hazard frailty model via inverse probability of treatment weighting, adjusted for multi-center clustering and excess censoring, Our data analysis reveals a more informative and superior performance of the full model in terms of treatment effect estimation, over sub-models that relaxes the various features of the event time dataset. 
    more » « less
  5. Telecystoscopy can lower the barrier to access critical urologic diagnostics for patients around the world. A major challenge for robotic control of flexible cystoscopes and intuitive teleoperation is the pose estimation of the scope tip. We propose a novel real-time camera localization method using video recordings from a prior cystoscopy and 3D bladder reconstruction to estimate cystoscope pose within the bladder during follow-up telecystoscopy. We map prior video frames into a low-dimensional space as a dictionary so that a new image can be likewise mapped to efficiently retrieve its nearest neighbor among the dictionary images. The cystoscope pose is then estimated by the correspondence among the new image, its nearest dictionary image, and the prior model from 3D reconstruction. We demonstrate performance of our methods using bladder phantoms with varying fidelity and a servo-controlled cystoscope to simulate the use case of bladder surveillance through telecystoscopy. The servo-controlled cystoscope with 3 degrees of freedom (angulation, roll, and insertion axes) was developed for collecting cystoscope videos from bladder phantoms. Cystoscope videos were acquired in a 2.5D bladder phantom (bladder-shape cross-section plus height) with a panorama of a urothelium attached to the inner surface. Scans of the 2.5D phantom were performed in separate arc trajectories each of which is generated by actuation on the angulation with a fixed roll and insertion length. We further included variance in moving speed, imaging distance and existence of bladder tumors. Cystoscope videos were also acquired in a water-filled 3D silicone bladder phantom with hand-painted vasculature. Scans of the 3D phantom were performed in separate circle trajectories each of which is generated by actuation on the roll axis under a fixed angulation and insertion length. These videos were used to create 3D reconstructions, dictionary sets, and test data sets for evaluating the computational efficiency and accuracy of our proposed method in comparison with a method based on global Scale-Invariant Feature Transform (SIFT) features, named SIFT-only. Our method can retrieve the nearest dictionary image for 94–100% of test frames in under 55[Formula: see text]ms per image, whereas the SIFT-only method can only find the image match for 56–100% of test frames in 6000–40000[Formula: see text]ms per image depending on size of the dictionary set and richness of SIFT features in the images. Our method, with a speed of around 20 Hz for the retrieval stage, is a promising tool for real-time image-based scope localization in robotic cystoscopy when prior cystoscopy images are available. 
    more » « less