Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverse-weighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data. 
                        more » 
                        « less   
                    
                            
                            Inverse-Weighted Survival Games
                        
                    
    
            Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverse-weighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data. Supplementary Material: pdf 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1815633
- PAR ID:
- 10397873
- Date Published:
- Journal Name:
- Thirty-Fifth Conference on Neural Information Processing Systems, NeurIPS 2021
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Summary Structural failure time models are causal models for estimating the effect of time-varying treatments on a survival outcome. G-estimation and artificial censoring have been proposed for estimating the model parameters in the presence of time-dependent confounding and administrative censoring. However, most existing methods require manually pre-processing data into regularly spaced data, which may invalidate the subsequent causal analysis. Moreover, the computation and inference are challenging due to the nonsmoothness of artificial censoring. We propose a class of continuous-time structural failure time models that respects the continuous-time nature of the underlying data processes. Under a martingale condition of no unmeasured confounding, we show that the model parameters are identifiable from a potentially infinite number of estimating equations. Using the semiparametric efficiency theory, we derive the first semiparametric doubly robust estimators, which are consistent if the model for the treatment process or the failure time model, but not necessarily both, is correctly specified. Moreover, we propose using inverse probability of censoring weighting to deal with dependent censoring. In contrast to artificial censoring, our weighting strategy does not introduce nonsmoothness in estimation and ensures that resampling methods can be used for inference.more » « less
- 
            For large observational studies lacking a control group (unlike randomized controlled trials, RCT), propensity scores (PS) are often the method of choice to account for pre-treatment confounding in baseline characteristics, and thereby avoid substantial bias in treatment estimation. A vast majority of PS techniques focus on average treatment effect estimation, without any clear consensus on how to account for confounders, especially in a multiple treatment setting. Furthermore, for time-to event outcomes, the analytical framework is further complicated in presence of high censoring rates (sometimes, due to non-susceptibility of study units to a disease), imbalance between treatment groups, and clustered nature of the data (where, survival outcomes appear in groups). Motivated by a right-censored kidney transplantation dataset derived from the United Network of Organ Sharing (UNOS), we investigate and compare two recent promising PS procedures, (a) the generalized boosted model (GBM), and (b) the covariate-balancing propensity score (CBPS), in an attempt to decouple the causal effects of treatments (here, study subgroups, such as hepatitis C virus (HCV) positive/negative donors, and positive/negative recipients) on time to death of kidney recipients due to kidney failure, post transplantation. For estimation, we employ a 2-step procedure which addresses various complexities observed in the UNOS database within a unified paradigm. First, to adjust for the large number of confounders on the multiple sub-groups, we fit multinomial PS models via procedures (a) and (b). In the next stage, the estimated PS is incorporated into the likelihood of a semi-parametric cure rate Cox proportional hazard frailty model via inverse probability of treatment weighting, adjusted for multi-center clustering and excess censoring, Our data analysis reveals a more informative and superior performance of the full model in terms of treatment effect estimation, over sub-models that relaxes the various features of the event time dataset.more » « less
- 
            Abstract Marginal structural models (MSMs) can be used to estimate the causal effect of a potentially time-varying treatment in the presence of time-dependent confounding via weighted regression. The standard approach of using inverse probability of treatment weighting (IPTW) can be sensitive to model misspecification and lead to high-variance estimates due to extreme weights. Various methods have been proposed to partially address this, including covariate balancing propensity score (CBPS) to mitigate treatment model misspecification, and truncation and stabilized-IPTW (sIPTW) to temper extreme weights. In this article, we present kernel optimal weighting (KOW), a convex-optimization-based approach that finds weights for fitting the MSMs that flexibly balance time-dependent confounders while simultaneously penalizing extreme weights, directly addressing the above limitations. We further extend KOW to control for informative censoring. We evaluate the performance of KOW in a simulation study, comparing it with IPTW, sIPTW, and CBPS. We demonstrate the use of KOW in studying the effect of treatment initiation on time-to-death among people living with human immunodeficiency virus and the effect of negative advertising on elections in the United States.more » « less
- 
            Sparse decision trees are one of the most common forms of interpretable models. While recent advances have produced algorithms that fully optimize sparse decision trees for prediction, that work does not address policy design, because the algorithms cannot handle weighted data samples. Specifically, they rely on the discreteness of the loss function, which means that real-valued weights cannot be directly used. For example, none of the existing techniques produce policies that incorporate inverse propensity weighting on individual data points. We present three algorithms for efficient sparse weighted decision tree optimization. The first approach directly optimizes the weighted loss function; however, it tends to be computationally inefficient for large datasets. Our second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart. Our third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight. We present theoretical bounds on the error of the two fast methods and show experimentally that these methods can be two orders of magnitude faster than the direct optimization of the weighted loss, without losing significant accuracy.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    