skip to main content


Title: Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration
Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is a variant of stochastic gradients with momentum where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates toward a global minimum. Many works report its empirical success in practice for solving stochastic nonconvex optimization problems; in particular, it has been observed to outperform overdamped Langevin Monte Carlo–based methods, such as stochastic gradient Langevin dynamics (SGLD), in many applications. Although the asymptotic global convergence properties of SGHMC are well known, its finite-time performance is not well understood. In this work, we study two variants of SGHMC based on two alternative discretizations of the underdamped Langevin diffusion. We provide finite-time performance bounds for the global convergence of both SGHMC variants for solving stochastic nonconvex optimization problems with explicit constants. Our results lead to nonasymptotic guarantees for both population and empirical risk minimization problems. For a fixed target accuracy level on a class of nonconvex problems, we obtain complexity bounds for SGHMC that can be tighter than those available for SGLD.  more » « less
Award ID(s):
2053485 1814888
NSF-PAR ID:
10326390
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Operations Research
ISSN:
0030-364X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problems 
    more » « less
  2. Kamalika Chaudhuri, Stefanie Jegelka (Ed.)
    While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks. 
    more » « less
  3. Abstract Variational quantum algorithms have the potential for significant impact on high-dimensional optimization, with applications in classical combinatorics, quantum chemistry, and condensed matter. Nevertheless, the optimization landscape of these algorithms is generally nonconvex, leading the algorithms to converge to local, rather than global, minima and the production of suboptimal solutions. In this work, we introduce a variational quantum algorithm that couples classical Markov chain Monte Carlo techniques with variational quantum algorithms, allowing the former to provably converge to global minima and thus assure solution quality. Due to the generality of our approach, it is suitable for a myriad of quantum minimization problems, including optimization and quantum state preparation. Specifically, we devise a Metropolis–Hastings method that is suitable for variational quantum devices and use it, in conjunction with quantum optimization, to construct quantum ensembles that converge to Gibbs states. These performance guarantees are derived from the ergodicity of our algorithm’s state space and enable us to place analytic bounds on its time-complexity. We demonstrate both the effectiveness of our technique and the validity of our analysis through quantum circuit simulations for MaxCut instances, solving these problems deterministically and with perfect accuracy, as well as large-scale quantum Ising and transverse field spin models of up to 50 qubits. Our technique stands to broadly enrich the field of variational quantum algorithms, improving and guaranteeing the performance of these promising, yet often heuristic, methods. 
    more » « less
  4. Belkin, M. ; Kpotufe, S. (Ed.)
    Langevin algorithms are gradient descent methods with additive noise. They have been used for decades in Markov Chain Monte Carlo (MCMC) sampling, optimization, and learning. Their convergence properties for unconstrained non-convex optimization and learning problems have been studied widely in the last few years. Other work has examined projected Langevin algorithms for sampling from log-concave distributions restricted to convex compact sets. For learning and optimization, log-concave distributions correspond to convex losses. In this paper, we analyze the case of non-convex losses with compact convex constraint sets and IID external data variables. We term the resulting method the projected stochastic gradient Langevin algorithm (PSGLA). We show the algorithm achieves a deviation of 𝑂(𝑇−1/4(𝑙𝑜𝑔𝑇)1/2) from its target distribution in 1-Wasserstein distance. For optimization and learning, we show that the algorithm achieves 𝜖-suboptimal solutions, on average, provided that it is run for a time that is polynomial in 𝜖 and slightly super-exponential in the problem dimension. 
    more » « less
  5. null (Ed.)
    Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT , an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT : the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings. 
    more » « less