Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problems
more »
« less
Exact Inference with Approximate Computation for Differentially Private Data via Perturbations
This paper discusses how two classes of approximate computation algorithms can be adapted, in a modular fashion, to achieve exact statistical inference from differentially private data products. Considered are approximate Bayesian computation for Bayesian inference, and Monte Carlo Expectation-Maximization for likelihood inference. Up to Monte Carlo error, inference from these algorithms is exact with respect to the joint specification of both the analyst's original data model, and the curator's differential privacy mechanism. Highlighted is a duality between approximate computation on exact data, and exact computation on approximate data, which can be leveraged by a well-designed computational procedure for statistical inference.
more »
« less
- Award ID(s):
- 1916002
- PAR ID:
- 10387759
- Date Published:
- Journal Name:
- Journal of Privacy and Confidentiality
- Volume:
- 12
- Issue:
- 2
- ISSN:
- 2575-8527
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We give a algorithm for exact sampling from the Bingham distribution p(x)∝exp(x⊤Ax) on the sphere Sd−1 with expected runtime of poly(d,λmax(A)−λmin(A)). The algorithm is based on rejection sampling, where the proposal distribution is a polynomial approximation of the pdf, and can be sampled from by explicitly evaluating integrals of polynomials over the sphere. Our algorithm gives exact samples, assuming exact computation of an inverse function of a polynomial. This is in contrast with Markov Chain Monte Carlo algorithms, which are not known to enjoy rapid mixing on this problem, and only give approximate samples. As a direct application, we use this to sample from the posterior distribution of a rank-1 matrix inference problem in polynomial time.more » « less
-
null (Ed.)We propose a new algorithm for inference of protein-protein interaction (PPI) networks from noisy time series of Liquid- Chromatography Mass-Spectrometry (LC-MS) proteomic expression data based on Approximate Bayesian Computation - Sequential Monte Carlo sampling (ABC-SMC). The algorithm is an extension of our previous framework PALLAS. The proposed algorithm can be easily modified to handle other complex models of expression data, such as LC-MS data, for which the likelihood function is intractable. Results based on synthetic time series of cytokine LC-MS measurements cor- responding to a prototype immunomic network demonstrate that our algorithm is capable of inferring the network topology accurately.more » « less
-
Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference; however, these algorithms are prohibitively slow in massive data settings because they require multiple passes through the full data in every iteration. Addressing this problem, we develop a scalable extension of these algorithms using the divide‐and‐conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using apoweredlikelihood and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset. The combined parameter draws play the role of draws from the original sampling algorithm. Our main contributions are twofold. First, we demonstrate through diverse simulated and real data analyses focusing on generalized linear models (GLMs) that our distributed algorithm delivers comparable results as the current state‐of‐the‐art D&C algorithms in terms of statistical accuracy and computational efficiency. Second, providing theoretical support for our empirical observations, we identify regularity assumptions under which the proposed algorithm leads to asymptotically optimal inference. We also provide illustrative examples focusing on normal linear and logistic regressions where parts of our D&C algorithm are analytically tractable.more » « less
-
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.more » « less
An official website of the United States government

