skip to main content

Title: Integrative Bayesian Models Using Post-Selective Inference: A Case Study in Radiogenomics

Integrative analyses based on statistically relevant associations between genomics and a wealth of intermediary phenotypes (such as imaging) provide vital insights into their clinical relevance in terms of the disease mechanisms. Estimates for uncertainty in the resulting integrative models are however unreliable unless inference accounts for the selection of these associations with accuracy. In this paper, we develop selection-aware Bayesian methods, which (1) counteract the impact of model selection bias through a “selection-aware posterior” in a flexible class of integrative Bayesian models post a selection of promising variables via ℓ1-regularized algorithms; (2) strike an inevitable trade-off between the quality of model selection and inferential power when the same data set is used for both selection and uncertainty estimation. Central to our methodological development, a carefully constructed conditional likelihood function deployed with a reparameterization mapping provides tractable updates when gradient-based Markov chain Monte Carlo (MCMC) sampling is used for estimating uncertainties from the selection-aware posterior. Applying our methods to a radiogenomic analysis, we successfully recover several important gene pathways and estimate uncertainties for their associations with patient survival times.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X Size: p. 1801-1813
["p. 1801-1813"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Parameters in climate models are usually calibrated manually, exploiting only small subsets of the available data. This precludes both optimal calibration and quantification of uncertainties. Traditional Bayesian calibration methods that allow uncertainty quantification are too expensive for climate models; they are also not robust in the presence of internal climate variability. For example, Markov chain Monte Carlo (MCMC) methods typically requiremodel runs and are sensitive to internal variability noise, rendering them infeasible for climate models. Here we demonstrate an approach to model calibration and uncertainty quantification that requires onlymodel runs and can accommodate internal climate variability. The approach consists of three stages: (a) a calibration stage uses variants of ensemble Kalman inversion to calibrate a model by minimizing mismatches between model and data statistics; (b) an emulation stage emulates the parameter‐to‐data map with Gaussian processes (GP), using the model runs in the calibration stage for training; (c) a sampling stage approximates the Bayesian posterior distributions by sampling the GP emulator with MCMC. We demonstrate the feasibility and computational efficiency of this calibrate‐emulate‐sample (CES) approach in a perfect‐model setting. Using an idealized general circulation model, we estimate parameters in a simple convection scheme from synthetic data generated with the model. The CES approach generates probability distributions of the parameters that are good approximations of the Bayesian posteriors, at a fraction of the computational cost usually required to obtain them. Sampling from this approximate posterior allows the generation of climate predictions with quantified parametric uncertainties.

    more » « less
  2. Summary

    Posterior distributions for the joint projections of future temperature and precipitation trends and changes are derived by applying a Bayesian hierachical model to a rich data set of simulated climate from general circulation models. The simulations that are analysed here constitute the future projections on which the Intergovernmental Panel on Climate Change based its recent summary report on the future of our planet’s climate, albeit without any sophisticated statistical handling of the data. Here we quantify the uncertainty that is represented by the variable results of the various models and their limited ability to represent the observed climate both at global and at regional scales. We do so in a Bayesian framework, by estimating posterior distributions of the climate change signals in terms of trends or differences between future and current periods, and we fully characterize the uncertain nature of a suite of other parameters, like biases, correlation terms and model-specific precisions. Besides presenting our results in terms of posterior distributions of the climate signals, we offer as an alternative representation of the uncertainties in climate change projections the use of the posterior predictive distribution of a new model’s projections. The results from our analysis can find straightforward applications in impact studies, which necessitate not only best guesses but also a full representation of the uncertainty in climate change projections. For water resource and crop models, for example, it is vital to use joint projections of temperature and precipitation to represent the characteristics of future climate best, and our statistical analysis delivers just that.

    more » « less
  3. ABSTRACT With high-quality data from programs like the Hubble Frontier Fields, cluster lensing has reached the point that models are dominated by systematic rather than statistical uncertainties. We introduce a Bayesian framework to quantify systematic effects by determining how different lens modelling choices affect the results. Our framework includes a new two-sample test for quantifying the difference between posterior probability distributions that are sampled by methods like Monte Carlo Markov chains. We use the framework to examine choices related to the selection and treatment of cluster member galaxies in two of the Frontier Field clusters: Abell 2744 and MACS J0416.1–2403. When selecting member galaxies, choices about depth and area affect the models; we find that model results are robust for an I-band magnitude limit of mlim ≥ 22.5 mag and a radial cut of rlim ≥ 90 arcsec (from the centre of the field), although the radial limit likely depends on the spatial extent of lensed images. Mass is typically assigned to galaxies using luminosity/mass scaling relations. We find that the slopes of the scaling relations can have significant effects on lens model parameters but only modest effects on lensing magnifications. Interestingly, scatter in the scaling relations affects the two fields differently. This analysis illustrates how our framework can be used to analyse lens modelling choices and guide future cluster lensing programs. 
    more » « less
  4. Abstract

    Uncertainty quantification of groundwater (GW) aquifer parameters is critical for efficient management and sustainable extraction of GW resources. These uncertainties are introduced by the data, model, and prior information on the parameters. Here, we develop a Bayesian inversion framework that uses Interferometric Synthetic Aperture Radar (InSAR) surface deformation data to infer the laterally heterogeneous permeability of a transient linear poroelastic model of a confined GW aquifer. The Bayesian solution of this inverse problem takes the form of a posterior probability density of the permeability. Exploring this posterior using classical Markov chain Monte Carlo (MCMC) methods is computationally prohibitive due to the large dimension of the discretized permeability field and the expense of solving the poroelastic forward problem. However, in many partial differential equation (PDE)‐based Bayesian inversion problems, the data are only informative in a few directions in parameter space. For the poroelasticity problem, we prove this property theoretically for a one‐dimensional problem and demonstrate it numerically for a three‐dimensional aquifer model. We design a generalized preconditioned Crank‐Nicolson (gpCN) MCMC method that exploits this intrinsic low dimensionality by using a low‐rank‐based Laplace approximation of the posterior as a proposal, which we build scalably. The feasibility of our approach is demonstrated through a real GW aquifer test in Nevada. The inherently two‐dimensional nature of InSAR surface deformation data informs a sufficient number of modes of the permeability field to allow detection of major structures within the aquifer, significantly reducing the uncertainty in the pressure and the displacement quantities of interest.

    more » « less
  5. null (Ed.)
    Dielectric elastomers are employed for a wide variety of adaptive structures. Many of these soft elastomers exhibit significant rate-dependencies in their response. Accurately quantifying this viscoelastic behavior is non-trivial and in many cases a nonlinear modeling framework is required. Fractional-order operators have been applied to modeling viscoelastic behavior for many years, and recent research has shown fractional-order methods to be effective for nonlinear frameworks. This implementation can become computationally expensive to achieve an accurate approximation of the fractional-order derivative. Accurate estimation of the elastomer’s viscoelastic behavior to quantify parameter uncertainty motivates the use of Markov Chain Monte Carlo (MCMC) methods. Since MCMC is a sampling based method, requiring many model evaluations, efficient estimation of the fractional derivative operator is crucial. In this paper, we demonstrate the effectiveness of using quadrature techniques to approximate the Riemann–Liouville definition for fractional derivatives in the context of estimating the uncertainty of a nonlinear viscoelastic model. We also demonstrate the use of parameter subset selection techniques to isolate parameters that are identifiable in the sense that they are uniquely determined by measured data. For those identifiable parameters, we employ Bayesian inference to compute posterior distributions for parameters. Finally, we propagate parameter uncertainties through the models to compute prediction intervals for quantities of interest. 
    more » « less