skip to main content

This content will become publicly available on July 7, 2022

Algorithm and Hardware Co-Design for FPGA Acceleration of Hamiltonian Monte Carlo Based No-U-Turn Sampler
Monte Carlo (MC) methods are widely used in many research areas such as physical simulation, statistical analysis, and machine learning. Application of MC methods requires drawing fast mixing samples from a given probability distribution. Among existing sampling methods, the Hamiltonian Monte Carlo (HMC) utilizes gradient information during Hamiltonian simulation and can produce fast mixing samples at the highest efficiency. However, without carefully chosen simulation parameters for a specific problem, HMC generally suffers from simulation locality and computation waste. As a result, the No-U-Turn Sampler (NUTS) has been proposed to automatically tune these parameters during simulation and is the current state-of-the-art sampling algorithm. However, application of NUTS requires frequent gradient calculation of a given distribution and high-volume vector processing, especially for large-scale problems, leading to drawing an expensively large number of samples and a desire of hardware acceleration. While some hardware acceleration works have been proposed for traditional Markov Chain Monte Carlo (MCMC) and HMC methods, there is no existing work targeting hardware acceleration of the NUTS algorithm. In this paper, we present the first NUTS accelerator on FPGA while addressing the high complexity of this state-of-the-art algorithm. Our hardware and algorithm co-optimizations include an incremental resampling technique which leads to more »
Authors:
;
Award ID(s):
Publication Date:
NSF-PAR ID:
10250144
Journal Name:
The 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors
Sponsoring Org:
National Science Foundation
##### More Like this
1. We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $\epsilon$ accuracy in 2-Wasserstein distance, our algorithm achieves $\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}%\wedge\frac{\kappa^2L^{-2}d\sigma^2}{\epsilon^2} \big)$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the state-of-the-art HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general log-concave distributions, and prove the corresponding gradient complexity as well. Experiments onmore »
2. Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) methods have been widely used to sample from certain probability distributions, incorporating (kernel) density derivatives and/or given datasets. Instead of exploring new samples from kernel spaces, this piece of work proposed a novel SGHMC sampler, namely Spectral Hamiltonian Monte Carlo (SpHMC), that produces the high dimensional sparse representations of given datasets through sparse sensing and SGHMC. Inspired by compressed sensing, we assume all given samples are low-dimensional measurements of certain high-dimensional sparse vectors, while a continuous probability distribution exists in such high-dimensional space. Specifically, given a dictionary for sparse coding, SpHMC first derives amore »
3. Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralizedmore »
4. We give a algorithm for exact sampling from the Bingham distribution p(x)∝exp(x⊤Ax) on the sphere Sd−1 with expected runtime of poly(d,λmax(A)−λmin(A)). The algorithm is based on rejection sampling, where the proposal distribution is a polynomial approximation of the pdf, and can be sampled from by explicitly evaluating integrals of polynomials over the sphere. Our algorithm gives exact samples, assuming exact computation of an inverse function of a polynomial. This is in contrast with Markov Chain Monte Carlo algorithms, which are not known to enjoy rapid mixing on this problem, and only give approximate samples. As a direct application, we usemore »
5. Differential privacy has been widely adopted to release continuous- and scalar-valued information on a database without compromising the privacy of individual data records in it. The problem of querying binary- and matrix-valued information on a database in a differentially private manner has rarely been studied. However, binary- and matrix-valued data are ubiquitous in real-world applications, whose privacy concerns may arise under a variety of circumstances. In this paper, we devise an exclusive or (XOR) mechanism that perturbs binary- and matrix-valued query result by conducting an XOR operation on the query result with calibrated noises attributed to a matrix-valued Bernoulli distribution.more »