skip to main content


Title: A Bayesian survival treed hazards model using latent Gaussian processes
Abstract

Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset.

 
more » « less
NSF-PAR ID:
10491177
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
80
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Analysis of time‐to‐event data using Cox's proportional hazards (PH) model is ubiquitous in scientific research. A sample is taken from the population of interest and covariate information is collected on everyone. If the event of interest is rare and covariate information is difficult to collect, the nested case‐control (NCC) design reduces costs with minimal impact on inferential precision. Under PH, application of the Cox model to data from a NCC sample provides consistent estimation of the hazard ratio. However, under non‐PH, the finite‐sample estimates corresponding to the Cox estimator depend on the number of controls sampled and the censoring distribution. We propose two estimators based on a binary predictor of interest: one recovers the estimand corresponding to the Cox model under a simple random sample, while the other recovers an estimand that does not depend on the censoring distribution. We derive the asymptotic distribution and provide finite‐sample variance estimators.

     
    more » « less
  2. Unlike standard prediction tasks, survival analysis requires modeling right censored data, which must be treated with care. While deep neural networks excel in traditional supervised learning, it remains unclear how to best utilize these models in survival analysis. A key question asks which data-generating assumptions of traditional survival models should be retained and which should be made more flexible via the function-approximating capabilities of neural networks. Rather than estimating the survival function targeted by most existing methods, we introduce a Deep Extended Hazard (DeepEH) model to provide a flexible and general framework for deep survival analysis. The extended hazard model includes the conventional Cox proportional hazards and accelerated failure time models as special cases, so DeepEH subsumes the popular Deep Cox proportional hazard (DeepSurv) and Deep Accelerated Failure Time (DeepAFT) models. We additionally provide theoretical support for the proposed DeepEH model by establishing consistency and convergence rate of the survival function estimator, which underscore the attractive feature that deep learning is able to detect low-dimensional structure of data in high-dimensional space. Numerical experiments also provide evidence that the proposed methods outperform existing statistical and deep learning approaches to survival analysis. 
    more » « less
  3. Summary

    The paper considers the problem of hypothesis testing and confidence intervals in high dimensional proportional hazards models. Motivated by a geometric projection principle, we propose a unified likelihood ratio inferential framework, including score, Wald and partial likelihood ratio statistics for hypothesis testing. Without assuming model selection consistency, we derive the asymptotic distributions of these test statistics, establish their semiparametric optimality and conduct power analysis under Pitman alternatives. We also develop new procedures to construct pointwise confidence intervals for the baseline hazard function and conditional hazard function. Simulation studies show that all tests proposed perform well in controlling type I errors. Moreover, the partial likelihood ratio test is empirically more powerful than the other tests. The methods proposed are illustrated by an example of a gene expression data set.

     
    more » « less
  4. Abstract

    Event logs, comprising data on the occurrence of different types of events and associated times, are commonly collected during the operation of modern industrial machines and systems. It is widely believed that the rich information embedded in event logs can be used to predict the occurrence of critical events. In this paper, we propose a recurrent neural network model using time‐to‐event data from event logs not only to predict the time of the occurrence of a target event of interest, but also to interpret, from the trained model, significant events leading to the target event. To improve the performance of our model, sampling techniques and methods dealing with the censored data are utilized. The proposed model is tested on both simulated data and real‐world datasets. Through these comparison studies, we show that the deep learning approach can often achieve better prediction performance than the traditional statistical model, such as, the Cox proportional hazard model. The real‐world case study also shows that the model interpretation algorithm proposed in this work can reveal the underlying physical relationship among events.

     
    more » « less
  5. Matching on an estimated propensity score is frequently used to estimate the effects of treatments from observational data. Since the 1970s, different authors have proposed methods to combine matching at the design stage with regression adjustment at the analysis stage when estimating treatment effects for continuous outcomes. Previous work has consistently shown that the combination has generally superior statistical properties than either method by itself. In biomedical and epidemiological research, survival or time-to-event outcomes are common. We propose a method to combine regression adjustment and propensity score matching to estimate survival curves and hazard ratios based on estimating an imputed potential outcome under control for each successfully matched treated subject, which is accomplished using either an accelerated failure time parametric survival model or a Cox proportional hazard model that is fit to the matched control subjects. That is, a fitted model is then applied to the matched treated subjects to allow simulation of the missing potential outcome under control for each treated subject. Conventional survival analyses (e.g., estimation of survival curves and hazard ratios) can then be conducted using the observed outcome under treatment and the imputed outcome under control. We evaluated the repeated-sampling bias of the proposed methods using simulations. When using nearest neighbor matching, the proposed method resulted in decreased bias compared to crude analyses in the matched sample. We illustrate the method in an example prescribing beta-blockers at hospital discharge to patients hospitalized with heart failure.

     
    more » « less