We propose the use of U-statistics to reduce variance for gradient estimation in importance-weighted variational inference. The key observation is that, given a base gradient estimator that requires m > 1 samples and a total of n > m samples to be used for estimation, lower variance is achieved by averaging the base estimator on overlapping batches of size m than disjoint batches, as currently done. We use classical U-statistic theory to analyze the variance reduction, and propose novel approximations with theoretical guarantees to ensure computational efficiency. We find empirically that U-statistic variance reduction can lead to modest to significant improvements in inference performance on a range of models, with little computational cost.
more »
« less
Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?
We prove that black-box variational infer- ence (BBVI) with control variates, particularly the sticking-the-landing (STL) estima- tor, converges at a geometric (traditionally called “linear”) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the pro jection onto is computable in O(𝑑) time, where 𝑑 is the dimensionality of the target posterior. We also improve existing analysis on the reg- ular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.
more »
« less
- Award ID(s):
- 2145644
- PAR ID:
- 10544964
- Publisher / Repository:
- Conference on Artificial Intelligence and Statistics (AISTATS 2024)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Meila, Marina; Zhang, Tong (Ed.)Black-box variational inference algorithms use stochastic sampling to analyze diverse statistical models, like those expressed in probabilistic programming languages, without model-specific derivations. While the popular score-function estimator computes unbiased gradient estimates, its variance is often unacceptably large, especially in models with discrete latent variables. We propose a stochastic natural gradient estimator that is as broadly applicable and unbiased, but improves efficiency by exploiting the curvature of the variational bound, and provably reduces variance by marginalizing discrete latent variables. Our marginalized stochastic natural gradients have intriguing connections to classic coordinate ascent variational inference, but allow parallel updates of variational parameters, and provide superior convergence guarantees relative to naive Monte Carlo approximations. We integrate our method with the probabilistic programming language Pyro and evaluate real-world models of documents, images, networks, and crowd-sourcing. Compared to score-function estimators, we require far fewer Monte Carlo samples and consistently convergence orders of magnitude faster.more » « less
-
Variational inference for state space models (SSMs) is known to be hard in general. Recent works focus on deriving variational objectives for SSMs from unbiased sequential Monte Carlo estimators. We reveal that the marginal particle filter is obtained from sequential Monte Carlo by applying Rao-Blackwellization operations, which sacrifices the trajectory information for reduced variance and differentiability. We propose the variational marginal particle filter (VMPF), which is a differentiable and reparameterizable variational filtering objective for SSMs based on an unbiased estimator. We find that VMPF with biased gradients gives tighter bounds than previous objectives, and the unbiased reparameterization gradients are sometimes beneficial.more » « less
-
Banerjee, Arindam; Fukumizu, Kenji (Ed.)We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward. Although many algorithms have been proposed for contextual bandit, most of them rely on finding the maximum likelihood estimator at each iteration, which requires 𝑂(𝑡) time at the 𝑡-th iteration and are memory inefficient. A natural way to resolve this problem is to apply online stochastic gradient descent (SGD) so that the per-step time and memory complexity can be reduced to constant with respect to 𝑡, but a contextual bandit policy based on online SGD updates that balances exploration and exploitation has remained elusive. In this work, we show that online SGD can be applied to the generalized linear bandit problem. The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves 𝑂̃ (𝑇‾‾√) regret with the total time complexity that scales linearly in 𝑇 and 𝑑, where 𝑇 is the total number of rounds and 𝑑 is the number of features. Experimental results show that SGD-TS consistently outperforms existing algorithms on both synthetic and real datasets.more » « less
-
Abstract This article presents an extremum‐seeking control (ESC) algorithm for unmodeled nonlinear systems with known steady‐state gain and generally non‐convex cost functions with bounded curvature. The main contribution of this article is a novel gradient estimator, which uses a polyhedral set that characterizes all gradient estimates consistent with the collected data. The gradient estimator is posed as a quadratic program, which selects the gradient estimate that provides the best worst‐case convergence of the closed‐loop Lyapunov function. We show that the polyhedral‐based gradient estimator ensures the stability of the closed‐loop system formed by the plant and optimization algorithm. Furthermore, the estimated gradient provably produces the optimal robust convergence. We demonstrate our ESC controller through three benchmark examples and one practical example, which shows our ESC has fast and robust convergence to the optimal equilibrium.more » « less
An official website of the United States government

