skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Prior and posterior checking of implicit causal assumptions
Abstract Causal inference practitioners have increasingly adopted machine learning techniques with the aim of producing principled uncertainty quantification for causal effects while minimizing the risk of model misspecification. Bayesian nonparametric approaches have attracted attention as well, both for their flexibility and their promise of providing natural uncertainty quantification. Priors on high‐dimensional or nonparametric spaces, however, can often unintentionally encode prior information that is at odds with substantive knowledge in causal inference—specifically, the regularization required for high‐dimensional Bayesian models to work can indirectly imply that the magnitude of the confounding is negligible. In this paper, we explain this problem and provide tools for (i) verifying that the prior distribution does not encode an inductive bias away from confounded models and (ii) verifying that the posterior distribution contains sufficient information to overcome this issue if it exists. We provide a proof‐of‐concept on simulated data from a high‐dimensional probit‐ridge regression model, and illustrate on a Bayesian nonparametric decision tree ensemble applied to a large medical expenditure survey.  more » « less
Award ID(s):
2144933
PAR ID:
10441570
Author(s) / Creator(s):
 
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
ISSN:
0006-341X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bayesian additive regression trees (BART) provides a flexible approach to fitting a variety of regression models while avoiding strong parametric assumptions. The sum-of-trees model is embedded in a Bayesian inferential framework to support uncertainty quantification and provide a principled approach to regularization through prior specification. This article presents the basic approach and discusses further development of the original algorithm that supports a variety of data structures and assumptions. We describe augmentations of the prior specification to accommodate higher dimensional data and smoother functions. Recent theoretical developments provide justifications for the performance observed in simulations and other settings. Use of BART in causal inference provides an additional avenue for extensions and applications. We discuss software options as well as challenges and future directions. 
    more » « less
  2. Abstract This paper extends the application of quantile-based Bayesian inference to probability distributions defined in terms of quantiles of observable quantities. Quantile-parameterized distributions are characterized by high shape flexibility and parameter interpretability, making them useful for eliciting information about observables. To encode uncertainty in the quantiles elicited from experts, we propose a Bayesian model based on the metalog distribution and a variant of the Dirichlet prior. We discuss the resulting hybrid expert elicitation protocol, which aims to characterize uncertainty in parameters by asking questions about observable quantities. We also compare and contrast this approach with parametric and predictive elicitation methods. 
    more » « less
  3. Double electron−electron resonance (DEER) spectroscopy measures distance distributions between spin labels in proteins, yielding important structural and energetic information about conformational landscapes. Analysis of an experimental DEER signal in terms of a distance distribution is a nontrivial task due to the ill-posed nature of the underlying mathematical inversion problem. This work introduces a Bayesian probabilistic inference approach to analyze DEER data, assuming a nonparametric distance distribution with a Tikhonov smoothness prior. The method uses Markov Chain Monte Carlo sampling with a compositional Gibbs sampler to determine a posterior probability distribution over the entire parameter space, including the distance distribution, given an experimental data set. This posterior contains all of the information available from the data, including a full quantification of the uncertainty about the model parameters. The corresponding uncertainty about the distance distribution is visually captured via an ensemble of posterior predictive distributions. Several examples are presented to illustrate the method. Compared with bootstrapping, it performs faster and provides slightly larger uncertainty intervals. 
    more » « less
  4. Abstract This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semicontinuous responses. In this paper, we present a methodology which allows for information to be shared nonparametrically across various model components using Bayesian sum‐of‐tree models. Our simulation results demonstrate that sharing of information across related model components is often very beneficial, particularly in sparse high‐dimensional problems in which variable selection must be conducted. We illustrate our methodology by analyzing medical expenditure data from the Medical Expenditure Panel Survey (MEPS). To facilitate the Bayesian nonparametric regression analysis, we develop two novel models for analyzing the MEPS data using Bayesian additive regression trees—a heteroskedastic log‐normal hurdle model with a “shrink‐toward‐homoskedasticity” prior and a gamma hurdle model. 
    more » « less
  5. Abstract Multivariate spatially oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High‐dimensional multivariate spatial data, which are the theme of this article, refer to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference, which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the matrix‐normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high‐dimensional parameter space including the latent spatial process. We illustrate the computational and inferential benefits of our algorithms over competing methods using simulation studies and an analysis of a massive vegetation index data set. 
    more » « less