skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Dynamics of Coordinate Ascent Variational Inference: A Case Study in 2D Ising Models
Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study the convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequential coordinate ascent algorithm and its parallel version. We observe that in the regime where the objective function is convex, both the algorithms are stable and exhibit convergence to the unique fixed point. Our analyses reveal interesting discordances between these two versions of the algorithm in the region when the objective function is non-convex. In fact, the parallel version exhibits a periodic oscillatory behavior which is absent in the sequential version. Drawing intuition from the Markov chain Monte Carlo literature, we empirically show that a parameter expansion of the Ising model, popularly called the Edward–Sokal coupling, leads to an enlargement of the regime of convergence to the global optima.  more » « less
Award ID(s):
1653404 1916371
PAR ID:
10254073
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Entropy
Volume:
22
Issue:
11
ISSN:
1099-4300
Page Range / eLocation ID:
1263
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Meila, Marina ; Zhang, Tong (Ed.)
    Black-box variational inference algorithms use stochastic sampling to analyze diverse statistical models, like those expressed in probabilistic programming languages, without model-specific derivations. While the popular score-function estimator computes unbiased gradient estimates, its variance is often unacceptably large, especially in models with discrete latent variables. We propose a stochastic natural gradient estimator that is as broadly applicable and unbiased, but improves efficiency by exploiting the curvature of the variational bound, and provably reduces variance by marginalizing discrete latent variables. Our marginalized stochastic natural gradients have intriguing connections to classic coordinate ascent variational inference, but allow parallel updates of variational parameters, and provide superior convergence guarantees relative to naive Monte Carlo approximations. We integrate our method with the probabilistic programming language Pyro and evaluate real-world models of documents, images, networks, and crowd-sourcing. Compared to score-function estimators, we require far fewer Monte Carlo samples and consistently convergence orders of magnitude faster. 
    more » « less
  2. We propose a broadly applicable variational inference algorithm for probabilistic models with binary latent variables, using sampling to approximate expectations required for coordinate ascent updates. Applied to three real-world models for text and image and network data, our approach converges much faster than REINFORCE-style stochastic gradient algorithms, and requires fewer Monte Carlo samples. Compared to hand-crafted variational bounds with model-dependent auxiliary variables, our approach leads to tighter likelihood bounds and greater robustness to local optima. Our method is designed to integrate easily with probabilistic programming languages for effective, scalable, black-box variational inference. 
    more » « less
  3. Abstract

    This article considers Bayesian model selection via mean-field (MF) variational approximation. Towards this goal, we study the non-asymptotic properties of MF inference that allows latent variables and model misspecification. Concretely, we show a Bernstein–von Mises (BvM) theorem for the variational distribution from MF under possible model misspecification, which implies the distributional convergence of MF variational approximation to a normal distribution centring at the maximal likelihood estimator. Motivated by the BvM theorem, we propose a model selection criterion using the evidence lower bound (ELBO), and demonstrate that the model selected by ELBO tends to asymptotically agree with the one selected by the commonly used Bayesian information criterion (BIC) as the sample size tends to infinity. Compared to BIC, ELBO tends to incur smaller approximation error to the log-marginal likelihood (a.k.a. model evidence) due to a better dimension dependence and full incorporation of the prior information. Moreover, we show the geometric convergence of the coordinate ascent variational inference algorithm, which provides a practical guidance on how many iterations one typically needs to run when approximating the ELBO. These findings demonstrate that variational inference is capable of providing a computationally efficient alternative to conventional approaches in tasks beyond obtaining point estimates.

     
    more » « less
  4. Motivated by approximation Bayesian computation using mean-field variational approximation and the computation of equilibrium in multi-species systems with cross-interaction, this paper investigates the composite geodesically convex optimization problem over multiple distributions. The objective functional under consideration is composed of a convex potential energy on a product of Wasserstein spaces and a sum of convex self-interaction and internal energies associated with each distribution. To efficiently solve this problem, we introduce the Wasserstein Proximal Coordinate Gradient (WPCG) algorithms with parallel, sequential, and random update schemes. Under a quadratic growth (QG) condition that is weaker than the usual strong convexity requirement on the objective functional, we show that WPCG converges exponentially fast to the unique global optimum. In the absence of the QG condition, WPCG is still demonstrated to converge to the global optimal solution, albeit at a slower polynomial rate. Numerical results for both motivating examples are consistent with our theoretical findings. 
    more » « less
  5. We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearly-optimal convergence rates for both smooth and non-smooth functions, even when they only have access to stochastic gradients. In addition, they do not require any prior knowledge on how the objective function is parametrized, since they automatically adjust their per-coordinate learning rate. These can be seen as truly accelerated Adagrad methods for constrained optimization. We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate. We also present a set of new results involving adaptive methods for unconstrained optimization and variational inequalities arising from monotone operators. 
    more » « less