skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: GPSRL: Learning Semi-Parametric Bayesian Survival Rule Lists from Heterogeneous Patient Data
Survival data is often collected in medical applications from a heterogeneous population of patients. While in the past, popular survival models focused on modeling the average effect of the covariates on survival outcomes, rapidly advancing sensing and information technologies have provided opportunities to further model the heterogeneity of the population as well as the non-linearity of the survival risk. With this motivation, we propose a new semi-parametric Bayesian Survival Rule List model in this paper. Our model derives a rule-based decision-making approach, while within the regime defined by each rule, survival risk is modelled via a Gaussian process latent variable model. Markov Chain Monte Carlo with a nested Laplace approximation on the Gaussian process posterior is used to search over the posterior of the rule lists efficiently. The use of ordered rule lists enables us to model heterogeneity while keeping the model complexity in check. Performance evaluations on a synthetic heterogeneous survival dataset and a real world sepsis survival dataset demonstrate the effectiveness of our model.  more » « less
Award ID(s):
1718513
PAR ID:
10220434
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Pattern Recognition
Volume:
2020
ISSN:
1051-4651
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Yang, Junyuan (Ed.)
    In this work, we develop a new set of Bayesian models to perform registration of real-valued functions. A Gaussian process prior is assigned to the parameter space of time warping functions, and a Markov chain Monte Carlo (MCMC) algorithm is utilized to explore the posterior distribution. While the proposed model can be defined on the infinite-dimensional function space in theory, dimension reduction is needed in practice because one cannot store an infinite-dimensional function on the computer. Existing Bayesian models often rely on some pre-specified, fixed truncation rule to achieve dimension reduction, either by fixing the grid size or the number of basis functions used to represent a functional object. In comparison, the new models in this paper randomize the truncation rule. Benefits of the new models include the ability to make inference on the smoothness of the functional parameters, a data-informative feature of the truncation rule, and the flexibility to control the amount of shape-alteration in the registration process. For instance, using both simulated and real data, we show that when the observed functions exhibit more local features, the posterior distribution on the warping functions automatically concentrates on a larger number of basis functions. Supporting materials including code and data to perform registration and reproduce some of the results presented herein are available online. 
    more » « less
  2. Gaussian processes are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Scientific data are often heterogeneous in their inputs and contain multiple known discrete groups of samples; thus, it is desirable to leverage the similarity among groups while accounting for heterogeneity across groups. We propose multi-group Gaussian processes (MGGPs) defined over Rp×C , where C is a finite set representing the group label, by developing general classes of valid (positive definite) covariance functions on such domains. MGGPs are able to accurately recover relationships between the groups and efficiently share strength across samples from all groups during inference, while capturing distinct group-specific behaviors in the conditional posterior distributions. We demonstrate inference in MGGPs through simulation experiments, and we apply our proposed MGGP regression framework to gene expression data to illustrate the behavior and enhanced inferential capabilities of multi-group Gaussian processes by jointly modeling continuous and categorical variables. 
    more » « less
  3. Abstract Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset. 
    more » « less
  4. 1. Identifying and accounting for unobserved individual heterogeneity in vital rates in demographic models is important for estimating population-level vital rates and identifying diverse life-history strategies, but much less is known about how this individual heterogeneity influences population dynamics. 2. We aimed to understand how the distribution of individual heterogeneity in reproductive and survival rates influenced population dynamics using vital rates from a Weddell seal population by altering the distribution of individual heterogeneity in reproduction, which also altered the distribution of individual survival rates through the incorporation of our estimate of the correlation between the two rates and assessing resulting changes in population growth. 3. We constructed an integral projection model (IPM) structured by age and reproductive state using estimates of vital rates for a long-lived mammal that has recently been shown to exhibit large individual heterogeneity in reproduction. Using output from the IPM, we evaluated how population dynamics changed with different underlying distributions of unobserved individual heterogeneity in reproduction. 4. Results indicate that the changes to the underlying distribution of individual heterogeneity in reproduction cause very small changes in the population growth rate and other population metrics. The largest difference in the estimated population growth rate resulting from changes to the underlying distribution of individual heterogeneity was less than 1%. 5. population level compared to the individual level. Although individual heterogeneity in reproduction may result in large differences in the lifetime fitness of individuals, changing the proportion of above- or below-average breeders in the population results in much smaller differences in annual population growth rate. For a long-lived mammal with stable and high adult-survival that gives birth to a single offspring, individual heterogeneity in reproduction has a limited effect on population dynamics. We posit that the limited effect of individual heterogeneity on population dynamics may be due to canalization of life-history traits. 
    more » « less
  5. BACKGROUND: Lung transplantation is the gold standard for a carefully selected patient population with end-stage lung disease. We sought to create a unique risk stratification model using only preoperative recipient data to predict one-year postoperative mortality during our pre-transplant assessment. METHODS: Data of lung transplant recipients at Houston Methodist Hospital (HMH) from 1/2009 to 12/2014 were extracted from the United Network for Organ Sharing (UNOS) database. Patients were randomly divided into development and validation cohorts. Cox proportional-hazards models were conducted. Variables associated with 1-year mortality post-transplant were assigned weights based on the beta coefficients, and risk scores were derived. Patients were stratified into low-, medium- and high-risk categories. Our model was validated using the validation dataset and data from other US transplant centers in the UNOS database RESULTS: We randomized 633 lung recipients from HMH into the development (n=317 patients) and validation cohort (n=316). One-year survival after transplant was significantly different among risk groups: 95% (low-risk), 84% (medium-risk), and 72% (high-risk) (p<0.001) with a C-statistic of 0.74. Patient survival in the validation cohort was also significantly different among risk groups (85%, 77% and 65%, respectively, p<0.001). Validation of the model with the UNOS dataset included 9,920 patients and found 1-year survival to be 91%, 86% and 82%, respectively (p < 0.001). CONCLUSIONS: Using only recipient data collected at the time of pre-listing evaluation, our simple scoring system has good discrimination power and can be a practical tool in the assessment and selection of potential lung transplant recipients. 
    more » « less