Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval‐censored survival data under a paradigm of Bayesian ensemble learning, called soft Bayesian additive regression trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left‐censored, right‐censored, and interval‐censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our method's applicability in studies involving high‐dimensional data with complex underlying associations.
- Award ID(s):
- 1853099
- NSF-PAR ID:
- 10353183
- Date Published:
- Journal Name:
- Statistical Methods in Medical Research
- Volume:
- 30
- Issue:
- 2
- ISSN:
- 0962-2802
- Page Range / eLocation ID:
- 508 to 522
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Summary Existing Bayesian model selection procedures require the specification of prior distributions on the parameters appearing in every model in the selection set. In practice, this requirement limits the application of Bayesian model selection methodology. To overcome this limitation, we propose a new approach towards Bayesian model selection that uses classical test statistics to compute Bayes factors between possible models. In several test cases, our approach produces results that are similar to previously proposed Bayesian model selection and model averaging techniques in which prior distributions were carefully chosen. In addition to eliminating the requirement to specify complicated prior distributions, this method offers important computational and algorithmic advantages over existing simulation-based methods. Because it is easy to evaluate the operating characteristics of this procedure for a given sample size and specified number of covariates, our method facilitates the selection of hyperparameter values through prior-predictive simulation.
-
Abstract In this paper, the panel count data analysis for recurrent events is considered. Such analysis is useful for studying tumor or infection recurrences in both clinical trial and observational studies. A bivariate Gaussian Cox process model is proposed to jointly model the observation process and the recurrent event process. Bayesian nonparametric inference is proposed for simultaneously estimating regression parameters, bivariate frailty effects, and baseline intensity functions. Inference is done through Markov chain Monte Carlo, with fully developed computational techniques. Predictive inference is also discussed under the Bayesian setting. The proposed method is shown to be efficient via simulation studies. A clinical trial dataset on skin cancer patients is analyzed to illustrate the proposed approach.
-
Abstract We consider theoretical and practical issues for innovatively using a large number of covariates in clinical trials to achieve various design objectives without model misspecification. Specifically, we propose a new family of semiparametric covariate‐adjusted response‐adaptive randomization (CARA) designs and we use the target maximum likelihood estimation (TMLE) to analyze the correlated data from CARA designs. Our approach can flexibly achieve multiple objectives and correctly incorporate the effect of a large number of covariates on the responses without model misspecification. We also obtain the consistency and asymptotic normality of the target parameters, allocation probabilities, and allocation proportions. Numerical studies demonstrate that our approach has advantages over existing approaches, even when the data‐generating distribution is complicated.
-
Abstract In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post‐baseline time. A simple solution is the last‐value‐carry‐forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high‐dimensional integrals without a closed‐form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time‐to‐event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real‐time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.