skip to main content


Title: Algorithm and Hardware Co-Design for FPGA Acceleration of Hamiltonian Monte Carlo Based No-U-Turn Sampler
Monte Carlo (MC) methods are widely used in many research areas such as physical simulation, statistical analysis, and machine learning. Application of MC methods requires drawing fast mixing samples from a given probability distribution. Among existing sampling methods, the Hamiltonian Monte Carlo (HMC) utilizes gradient information during Hamiltonian simulation and can produce fast mixing samples at the highest efficiency. However, without carefully chosen simulation parameters for a specific problem, HMC generally suffers from simulation locality and computation waste. As a result, the No-U-Turn Sampler (NUTS) has been proposed to automatically tune these parameters during simulation and is the current state-of-the-art sampling algorithm. However, application of NUTS requires frequent gradient calculation of a given distribution and high-volume vector processing, especially for large-scale problems, leading to drawing an expensively large number of samples and a desire of hardware acceleration. While some hardware acceleration works have been proposed for traditional Markov Chain Monte Carlo (MCMC) and HMC methods, there is no existing work targeting hardware acceleration of the NUTS algorithm. In this paper, we present the first NUTS accelerator on FPGA while addressing the high complexity of this state-of-the-art algorithm. Our hardware and algorithm co-optimizations include an incremental resampling technique which leads to a more memory efficient architecture and pipeline optimization for multi-chain sampling to maximize the throughput. We also explore three levels of parallelism in the NUTS accelerator to further boost performance. Compared with optimized C++ NUTS package: RSTAN, our NUTS accelerator can reach a maximum speedup of 50.6X and an energy improvement of 189.7X.  more » « less
Award ID(s):
1741173
NSF-PAR ID:
10250144
Author(s) / Creator(s):
;
Date Published:
Journal Name:
The 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate solution methods for large-scale inverse problems governed by partial differential equations (PDEs) via Bayesian inference. The Bayesian framework provides a statistical setting to infer uncertain parameters from noisy measurements. To quantify posterior uncertainty, we adopt Markov Chain Monte Carlo (MCMC) approaches for generating samples. To increase the efficiency of these approaches in high-dimension, we make use of local information about gradient and Hessian of the target potential, also via Hamiltonian Monte Carlo (HMC). Our target application is inferring the field of soil permeability processing observations of pore pressure, using a nonlinear PDE poromechanics model for predicting pressure from permeability. We compare the performance of different sampling approaches in this and other settings. We also investigate the effect of dimensionality and non-gaussianity of distributions on the performance of different sampling methods.

     
    more » « less
  2. Abstract

    Molecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modelled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet‐multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

     
    more » « less
  3. We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $\epsilon$ accuracy in 2-Wasserstein distance, our algorithm achieves $\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}%\wedge\frac{\kappa^2L^{-2}d\sigma^2}{\epsilon^2} \big)$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the state-of-the-art HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general log-concave distributions, and prove the corresponding gradient complexity as well. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm. 
    more » « less
  4. Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference; however, these algorithms are prohibitively slow in massive data settings because they require multiple passes through the full data in every iteration. Addressing this problem, we develop a scalable extension of these algorithms using the divide‐and‐conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using apoweredlikelihood and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset. The combined parameter draws play the role of draws from the original sampling algorithm. Our main contributions are twofold. First, we demonstrate through diverse simulated and real data analyses focusing on generalized linear models (GLMs) that our distributed algorithm delivers comparable results as the current state‐of‐the‐art D&C algorithms in terms of statistical accuracy and computational efficiency. Second, providing theoretical support for our empirical observations, we identify regularity assumptions under which the proposed algorithm leads to asymptotically optimal inference. We also provide illustrative examples focusing on normal linear and logistic regressions where parts of our D&C algorithm are analytically tractable.

     
    more » « less
  5. We propose Pathfinder, a variational method for approximately sampling from differentiable probability densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the target distribution. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores. 
    more » « less