- Award ID(s):
- 1811812
- 10172090
- Date Published:
- Journal Name:
- Biometrika
- 0006-3444
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problemsmore » « less
Abstract In causal inference problems, one is often tasked with estimating causal effects which are analytically intractable functionals of the data‐generating mechanism. Relevant settings include estimating intention‐to‐treat effects in longitudinal problems with missing data or computing direct and indirect effects in mediation analysis. One approach to computing these effects is to use the
g ‐formula implemented via Monte Carlo integration; when simulation‐based methods such as the nonparametric bootstrap or Markov chain Monte Carlo are used for inference, Monte Carlo integration must be nested within an already computationally intensive algorithm. We develop a widely‐applicable approach to accelerating this Monte Carlo integration step which greatly reduces the computational burden of existingg ‐computation algorithms. We refer to our method as acceleratedg ‐computation (AGC). The algorithms we present are similar in spirit to multiple imputation, but require removing within‐imputation variance from the standard error rather than adding it. We illustrate the use of AGC on a mediation analysis problem using a beta regression model and in a longitudinal clinical trial subject to nonignorable missingness using a Bayesian additive regression trees model. -
Belkin, M. ; Kpotufe, S. (Ed.)Langevin algorithms are gradient descent methods with additive noise. They have been used for decades in Markov Chain Monte Carlo (MCMC) sampling, optimization, and learning. Their convergence properties for unconstrained non-convex optimization and learning problems have been studied widely in the last few years. Other work has examined projected Langevin algorithms for sampling from log-concave distributions restricted to convex compact sets. For learning and optimization, log-concave distributions correspond to convex losses. In this paper, we analyze the case of non-convex losses with compact convex constraint sets and IID external data variables. We term the resulting method the projected stochastic gradient Langevin algorithm (PSGLA). We show the algorithm achieves a deviation of 𝑂(𝑇−1/4(𝑙𝑜𝑔𝑇)1/2) from its target distribution in 1-Wasserstein distance. For optimization and learning, we show that the algorithm achieves 𝜖-suboptimal solutions, on average, provided that it is run for a time that is polynomial in 𝜖 and slightly super-exponential in the problem dimension.more » « less
Markov chain Monte Carlo algorithms have important applications in counting problems and in machine learning problems, settings that involve estimating quantities that are difficult to compute exactly. How much can quantum computers speed up classical Markov chain algorithms? In this work we consider the problem of speeding up simulated annealing algorithms, where the stationary distributions of the Markov chains are Gibbs distributions at temperatures specified according to an annealing schedule. We construct a quantum algorithm that both adaptively constructs an annealing schedule and quantum samples at each temperature. Our adaptive annealing schedule roughly matches the length of the best classical adaptive annealing schedules and improves on nonadaptive temperature schedules by roughly a quadratic factor. Our dependence on the Markov chain gap matches other quantum algorithms and is quadratically better than what classical Markov chains achieve. Our algorithm is the first to combine both of these quadratic improvements. Like other quantum walk algorithms, it also improves on classical algorithms by producing “qsamples” instead of classical samples. This means preparing quantum states whose amplitudes are the square roots of the target probability distribution. In constructing the annealing schedule we make use of amplitude estimation, and we introduce a method for making amplitude estimation nondestructive at almost no additional cost, a result that may have independent interest. Finally we demonstrate how this quantum simulated annealing algorithm can be applied to the problems of estimating partition functions and Bayesian inference.more » « less
We investigate solution methods for large-scale inverse problems governed by partial differential equations (PDEs) via Bayesian inference. The Bayesian framework provides a statistical setting to infer uncertain parameters from noisy measurements. To quantify posterior uncertainty, we adopt Markov Chain Monte Carlo (MCMC) approaches for generating samples. To increase the efficiency of these approaches in high-dimension, we make use of local information about gradient and Hessian of the target potential, also via Hamiltonian Monte Carlo (HMC). Our target application is inferring the field of soil permeability processing observations of pore pressure, using a nonlinear PDE poromechanics model for predicting pressure from permeability. We compare the performance of different sampling approaches in this and other settings. We also investigate the effect of dimensionality and non-gaussianity of distributions on the performance of different sampling methods.