skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: GPSRL: Learning Semi-Parametric Bayesian Survival Rule Lists from Heterogeneous Patient Data
Survival data is often collected in medical applications from a heterogeneous population of patients. While in the past, popular survival models focused on modeling the average effect of the covariates on survival outcomes, rapidly advancing sensing and information technologies have provided opportunities to further model the heterogeneity of the population as well as the non-linearity of the survival risk. With this motivation, we propose a new semi-parametric Bayesian Survival Rule List model in this paper. Our model derives a rule-based decision-making approach, while within the regime defined by each rule, survival risk is modelled via a Gaussian process latent variable model. Markov Chain Monte Carlo with a nested Laplace approximation on the Gaussian process posterior is used to search over the posterior of the rule lists efficiently. The use of ordered rule lists enables us to model heterogeneity while keeping the model complexity in check. Performance evaluations on a synthetic heterogeneous survival dataset and a real world sepsis survival dataset demonstrate the effectiveness of our model.  more » « less
Award ID(s):
1718513
PAR ID:
10220434
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Pattern Recognition
Volume:
2020
ISSN:
1051-4651
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Yang, Junyuan (Ed.)
    In this work, we develop a new set of Bayesian models to perform registration of real-valued functions. A Gaussian process prior is assigned to the parameter space of time warping functions, and a Markov chain Monte Carlo (MCMC) algorithm is utilized to explore the posterior distribution. While the proposed model can be defined on the infinite-dimensional function space in theory, dimension reduction is needed in practice because one cannot store an infinite-dimensional function on the computer. Existing Bayesian models often rely on some pre-specified, fixed truncation rule to achieve dimension reduction, either by fixing the grid size or the number of basis functions used to represent a functional object. In comparison, the new models in this paper randomize the truncation rule. Benefits of the new models include the ability to make inference on the smoothness of the functional parameters, a data-informative feature of the truncation rule, and the flexibility to control the amount of shape-alteration in the registration process. For instance, using both simulated and real data, we show that when the observed functions exhibit more local features, the posterior distribution on the warping functions automatically concentrates on a larger number of basis functions. Supporting materials including code and data to perform registration and reproduce some of the results presented herein are available online. 
    more » « less
  2. Gaussian processes are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Scientific data are often heterogeneous in their inputs and contain multiple known discrete groups of samples; thus, it is desirable to leverage the similarity among groups while accounting for heterogeneity across groups. We propose multi-group Gaussian processes (MGGPs) defined over Rp×C , where C is a finite set representing the group label, by developing general classes of valid (positive definite) covariance functions on such domains. MGGPs are able to accurately recover relationships between the groups and efficiently share strength across samples from all groups during inference, while capturing distinct group-specific behaviors in the conditional posterior distributions. We demonstrate inference in MGGPs through simulation experiments, and we apply our proposed MGGP regression framework to gene expression data to illustrate the behavior and enhanced inferential capabilities of multi-group Gaussian processes by jointly modeling continuous and categorical variables. 
    more » « less
  3. Abstract Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset. 
    more » « less
  4. 1. Identifying and accounting for unobserved individual heterogeneity in vital rates in demographic models is important for estimating population-level vital rates and identifying diverse life-history strategies, but much less is known about how this individual heterogeneity influences population dynamics. 2. We aimed to understand how the distribution of individual heterogeneity in reproductive and survival rates influenced population dynamics using vital rates from a Weddell seal population by altering the distribution of individual heterogeneity in reproduction, which also altered the distribution of individual survival rates through the incorporation of our estimate of the correlation between the two rates and assessing resulting changes in population growth. 3. We constructed an integral projection model (IPM) structured by age and reproductive state using estimates of vital rates for a long-lived mammal that has recently been shown to exhibit large individual heterogeneity in reproduction. Using output from the IPM, we evaluated how population dynamics changed with different underlying distributions of unobserved individual heterogeneity in reproduction. 4. Results indicate that the changes to the underlying distribution of individual heterogeneity in reproduction cause very small changes in the population growth rate and other population metrics. The largest difference in the estimated population growth rate resulting from changes to the underlying distribution of individual heterogeneity was less than 1%. 5. population level compared to the individual level. Although individual heterogeneity in reproduction may result in large differences in the lifetime fitness of individuals, changing the proportion of above- or below-average breeders in the population results in much smaller differences in annual population growth rate. For a long-lived mammal with stable and high adult-survival that gives birth to a single offspring, individual heterogeneity in reproduction has a limited effect on population dynamics. We posit that the limited effect of individual heterogeneity on population dynamics may be due to canalization of life-history traits. 
    more » « less
  5. Antibiotic responses in bacteria are highly dynamic and heterogeneous, with sudden exposure of bacterial colonies to high drug doses resulting in the coexistence of recovered and arrested cells. The dynamics of the response is determined by regulatory circuits controlling the expression of resistance genes, which are in turn modulated by the drug’s action on cell growth and metabolism. Despite advances in understanding gene regulation at the molecular level, we still lack a framework to describe how feedback mechanisms resulting from the interdependence between expression of resistance and cell metabolism can amplify naturally occurring noise and create heterogeneity at the population level. To understand how this interplay affects cell survival upon exposure, we constructed a mathematical model of the dynamics of antibiotic responses that links metabolism and regulation of gene expression, based on the tetracycline resistancetetoperon inE. coli. We use this model to interpret measurements of growth and expression of resistance in microfluidic experiments, both in single cells and in biofilms. We also implemented a stochastic model of the drug response, to show that exposure to high drug levels results in large variations of recovery times and heterogeneity at the population level. We show that stochasticity is important to determine how nutrient quality affects cell survival during exposure to high drug concentrations. A quantitative description of how microbes respond to antibiotics in dynamical environments is crucial to understand population-level behaviors such as biofilms and pathogenesis. 
    more » « less