skip to main content

Title: Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo
Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problems
Authors:
; ; ;
Award ID(s):
2053485 1814888
Publication Date:
NSF-PAR ID:
10326387
Journal Name:
Journal of machine learning research
Volume:
22
Page Range or eLocation-ID:
1-69
ISSN:
1532-4435
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a stochastic variational inference algorithm for training large-scale Bayesian networks, where noisy-OR conditional distributions are used to capture higher-order relationships. One application is to the learning of hierarchical topic models for text data. While previous work has focused on two-layer networks popular in applications like medical diagnosis, we develop scalable algorithms for deep networks that capture a multi-level hierarchy of interactions. Our key innovation is a family of constrained variational bounds that only explicitly optimize posterior probabilities for the sub-graph of topics most related to the sparse observations in a given document. These constrained bounds have comparable accuracymore »but dramatically reduced computational cost. Using stochastic gradient updates based on our variational bounds, we learn noisy-OR Bayesian networks orders of magnitude faster than was possible with prior Monte Carlo learning algorithms, and provide a new tool for understanding large-scale binary data.« less
  2. Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens ofmore »thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational–statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.« less
  3. Summary Stochastic gradient Markov chain Monte Carlo algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient Markov chain Monte Carlo algorithm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much moremore »efficient than traditional Markov chain Monte Carlo algorithms.« less
  4. McCulloch, R. (Ed.)
    Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, wemore »aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis.« less
  5. Electrification of vehicles is becoming one of the main avenues for decarbonization of the transportation market. To reduce stress on the energy grid, large-scale charging will require optimal scheduling of when electricity is delivered to vehicles. Coordinated electric-vehicle charging can produce optimal, flattened loads that would improve reliability of the power system as well as reduce system costs and emissions. However, a challenge for successful introduction of coordinated deadline-scheduling of residential charging comes from the demand side: customers would need to be willing both to defer charging their vehicles and to accept less than a 100% target for battery charge.more »Within a coordinated electric-vehicle charging pilot run by the local utility in upstate New York, this study analyzes the necessary incentives for customers to accept giving up control of when charging of their vehicles takes place. Using data from a choice experiment implemented in an online survey of electric-vehicle owners and lessees in upstate New York (N=462), we make inference on the willingness to pay for features of hypothetical coordinated electric-vehicle charging programs. To address unobserved preference heterogeneity, we apply Variational Bayes (VB) inference to a mixed logit model. Stochastic variational inference has recently emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of discrete choice models. Our results show that individuals negatively perceive the duration of the timeframe in which the energy provider would be allowed to defer charging, even though both the desired target for battery charge and deadline would be respected. This negative monetary valuation is evidenced by an expected average reduction in the annual fee of joining the charging program of $2.64 per hour of control yielded to the energy provider. Our results also provide evidence of substantial heterogeneity in preferences. For example, the 25% quantile of the posterior distribution of the mean of the willingness to accept an additional hour of control yielded to the utility is $5.16. However, the negative valuation of the timeframe for deferring charging is compensated by positive valuation of emission savings coming from switching charging to periods of the day with a higher proportion of generation from renewable sources. Customers also positively valued discounts in the price of energy delivery.« less