skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A conditional density estimation partition model using logistic Gaussian processes
Summary Conditional density estimation seeks to model the distribution of a response variable conditional on covariates. We propose a Bayesian partition model using logistic Gaussian processes to perform conditional density estimation. The partition takes the form of a Voronoi tessellation and is learned from the data using a reversible jump Markov chain Monte Carlo algorithm. The methodology models data in which the density changes sharply throughout the covariate space, and can be used to determine where important changes in the density occur. The Markov chain Monte Carlo algorithm involves a Laplace approximation on the latent variables of the logistic Gaussian process model which marginalizes the parameters in each partition element, allowing an efficient search of the approximate posterior distribution of the tessellation. The method is consistent when the density is piecewise constant in the covariate space or when the density is Lipschitz continuous with respect to the covariates. In simulation and application to wind turbine data, the model successfully estimates the partition structure and conditional distribution.  more » « less
Award ID(s):
1934904
PAR ID:
10178809
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biometrika
Volume:
107
Issue:
1
ISSN:
0006-3444
Page Range / eLocation ID:
173 to 190
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Stochastic gradient Markov chain Monte Carlo algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient Markov chain Monte Carlo algorithm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much more efficient than traditional Markov chain Monte Carlo algorithms. 
    more » « less
  2. Like many other clinical and economic studies, each subject of our motivating transplant study is at risk of recurrent events of non-fatal tissue rejections as well as the terminating event of death due to total graft rejection. For such studies, our model and associated Bayesian analysis aim for some practical advantages over competing methods. Our semiparametric latent-class-based joint model has coherent interpretation of the covariate (including race and gender) effects on all functions and model quantities that are relevant for understanding the effects of covariates on future event trajectories. Our fully Bayesian method for estimation and prediction uses a complete specification of the prior process of the baseline functions. We also derive a practical and theoretically justifiable partial likelihood-based semiparametric Bayesian approach to deal with the analysis when there is a lack of prior information about baseline functions. Our model and method can accommodate fixed as well as time-varying covariates. Our Markov Chain Monte Carlo tools for both Bayesian methods are implementable via publicly available software. Our Bayesian analysis of transplant study and simulation study demonstrate practical advantages and improved performance of our approach. 
    more » « less
  3. Gaussian mixtures are commonly used for modeling heavy-tailed error distributions in robust linear regression. Combining the likelihood of a multivariate robust linear regression model with a standard improper prior distribution yields an analytically intractable posterior distribution that can be sampled using a data augmentation algorithm. When the response matrix has missing entries, there are unique challenges to the application and analysis of the convergence properties of the algorithm. Conditions for geometric ergodicity are provided when the incomplete data have a “monotone” structure. In the absence of a monotone structure, an intermediate imputation step is necessary for implementing the algorithm. In this case, we provide sufficient conditions for the algorithm to be Harris ergodic. Finally, we show that, when there is a monotone structure and intermediate imputation is unnecessary, intermediate imputation slows the convergence of the underlying Monte Carlo Markov chain, while post hoc imputation does not. An R package for the data augmentation algorithm is provided. 
    more » « less
  4. Abstract Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset. 
    more » « less
  5. Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problems 
    more » « less