We propose a very fast approximate Markov chain Monte Carlo sampling framework that is applicable to a large class of sparse Bayesian inference problems. The computational cost per iteration in several regression models is of order O(n(s+J)), where n is the sample size, s is the underlying sparsity of the model, and J is the size of a randomly selected subset of regressors. This cost can be further reduced by data subsampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. [(2013). Analyzing Hogwild parallel Gaussian Gibbs sampling. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13) (Vol. 2, pp. 2715–2723)], but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening [Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research, 10, 2013–2038]. We show that in highdimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore, we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.
more » « less NSFPAR ID:
 10462440
 Publisher / Repository:
 Oxford University Press
 Date Published:
 Journal Name:
 Journal of the Royal Statistical Society Series B: Statistical Methodology
 ISSN:
 13697412
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Chaudhuri, Kamalika and (Ed.)Spikeandslab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spikeandslab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable SpikeandSlab (S^3), a scalable Gibbs sampling implementation for highdimensional Bayesian regression with the continuous spikeandslab prior of George & McCulloch (1993). For a dataset with n observations and p covariates, S^3 has order max{n^2 p_t, np} computational cost at iteration t where p_t never exceeds the number of covariates switching spikeandslab states between iterations t and t1 of the Markov chain. This improves upon the order n^2 p periteration cost of stateoftheart implementations as, typically, p_t is substantially smaller than p. We apply S^3 on synthetic and realworld datasets, demonstrating orders of magnitude speedups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.more » « less

Wootters, Mary ; Sanita, Laura (Ed.)The SwendsenWang algorithm is a sophisticated, widelyused Markov chain for sampling from the Gibbs distribution for the ferromagnetic Ising and Potts models. This chain has proved difficult to analyze, due in part to the global nature of its updates. We present optimal bounds on the convergence rate of the SwendsenWang algorithm for the complete dary tree. Our bounds extend to the nonuniqueness region and apply to all boundary conditions. We show that the spatial mixing conditions known as Variance Mixing and Entropy Mixing, introduced in the study of local Markov chains by Martinelli et al. (2003), imply Ω(1) spectral gap and O(log n) mixing time, respectively, for the SwendsenWang dynamics on the dary tree. We also show that these bounds are asymptotically optimal. As a consequence, we establish Θ(log n) mixing for the SwendsenWang dynamics for all boundary conditions throughout the tree uniqueness region; in fact, our bounds hold beyond the uniqueness threshold for the Ising model, and for the qstate Potts model when q is small with respect to d. Our proofs feature a novel spectral view of the Variance Mixing condition inspired by several recent rapid mixing results on highdimensional expanders and utilize recent work on block factorization of entropy under spatial mixing conditions.more » « less

Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with blackbox Markov chain Monte Carlo (MCMC) methods. We present hIPPYlibMUQ, an extensible and scalable software framework that contains implementations of stateofthe art algorithms aimed to overcome the challenges of highdimensional, PDEconstrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic lowdimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary opensource software packages, hIPPYlib and MUQ. hIPPYlib solves PDEconstrained inverse problems using automaticallygenerated adjointbased derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit largescale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex largescale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlibMUQ, we present a comparison of a number of MCMC methods available in the integrated software on several highdimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradientbased MCMC algorithms can be obtained by exploiting Hessian information (from the logposterior), underscoring the power of the integrated hIPPYlibMUQ framework.more » « less

Many problems in the physical sciences, machine learning, and statistical inference necessitate sampling from a highdimensional, multimodal probability distribution. Markov Chain Monte Carlo (MCMC) algorithms, the ubiquitous tool for this task, typically rely on random local updates to propagate configurations of a given system in a way that ensures that generated configurations will be distributed according to a target probability distribution asymptotically. In highdimensional settings with multiple relevant metastable basins, local approaches require either immense computational effort or intricately designed importance sampling strategies to capture information about, for example, the relative populations of such basins. Here, we analyze an adaptive MCMC, which augments MCMC sampling with nonlocal transition kernels parameterized with generative models known as normalizing flows. We focus on a setting where there are no preexisting data, as is commonly the case for problems in which MCMC is used. Our method uses 1) an MCMC strategy that blends local moves obtained from any standard transition kernel with those from a generative model to accelerate the sampling and 2) the data generated this way to adapt the generative model and improve its efficacy in the MCMC algorithm. We provide a theoretical analysis of the convergence properties of this algorithm and investigate numerically its efficiency, in particular in terms of its propensity to equilibrate fast between metastable modes whose rough location is known a priori but respective probability weight is not. We show that our algorithm can sample effectively across large free energy barriers, providing dramatic accelerations relative to traditional MCMC algorithms.more » « less

Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DESGLD) and Decentralized SGHMC (DESGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly logconcave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problemsmore » « less