skip to main content


Title: Amortized Population Gibbs Samplers with Neural Sufficient Statistics
We develop amortized population Gibbs (APG) samplers, a class of scalable methods that frame structured variational inference as adaptive importance sampling. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train each conditional proposal by minimizing the inclusive KL divergence with respect to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics. Experiments show that APG samplers can be used to train highly-structured deep generative models in an unsupervised manner, and achieve substantial improvements in inference accuracy relative to standard autoencoding variational methods.  more » « less
Award ID(s):
1835309
NSF-PAR ID:
10280401
Author(s) / Creator(s):
; ; ;
Editor(s):
III, H.D.; Singh, A.
Date Published:
Journal Name:
Proceedings of the 37th International Conference on Machine Learning
Volume:
119
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Wortman Vaughan, J. (Ed.)
    We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size. 
    more » « less
  2. Abstract

    Recent advances in deep learning for neural networks with large numbers of parameters have been enabled by automatic differentiation, an algorithmic technique for calculating gradients of measures of model fit with respect to model parameters. Estimation of high‐dimensional parameter sets is an important problem within the hydrological sciences. Here, we demonstrate the effectiveness of gradient‐based estimation techniques for high‐dimensional inverse estimation problems using a conceptual rainfall‐runoff model. In particular, we compare the effectiveness of Hamiltonian Monte Carlo and automatic differentiation variational inference against two nongradient‐dependent methods, random walk Metropolis and differential evolution Metropolis. We show that the former two techniques exhibit superior performance for inverse estimation of daily rainfall values and are much more computationally efficient on larger data sets in an experiment with synthetic data. We also present a case study evaluating the effectiveness of automatic differentiation variational inference for inverse estimation over 25 years of daily precipitation conditional on streamflow observations at three catchments and show that it is scalable to very high dimensional parameter spaces. The presented results highlight the power of combining hydrological process‐based models with optimization techniques from deep learning for high‐dimensional estimation problems.

     
    more » « less
  3. Abstract

    The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. We introduce a very general class of estimators called the variational method of moments (VMM), motivated by a variational minimax reformulation of optimally weighted generalized method of moments for finite sets of moments. VMM controls infinitely for many moments characterized by flexible function classes such as neural nets and kernel methods, while provably maintaining statistical efficiency unlike existing related minimax estimators. We also develop inference algorithms and demonstrate the empirical strengths of VMM estimation and inference in experiments.

     
    more » « less
  4. Representation learning over graph structured data has been mostly studied in static graph settings while efforts for modeling dynamic graphs are still scant. In this paper, we develop a novel hierarchical variational model that introduces additional latent random variables to jointly model the hidden states of a graph recurrent neural network (GRNN) to capture both topology and node attribute changes in dynamic graphs. We argue that the use of high-level latent random variables in this variational GRNN (VGRNN) can better capture potential variability observed in dynamic graphs as well as the uncertainty of node latent representation. With semi-implicit variational inference developed for this new VGRNN architecture (SI-VGRNN), we show that flexible non-Gaussian latent representations can further help dynamic graph analytic tasks. Our experiments with multiple real-world dynamic graph datasets demonstrate that SI-VGRNN and VGRNN consistently outperform the existing baseline and state-of-the-art methods by a significant margin in dynamic link prediction. 
    more » « less
  5. Representation learning over graph structured data has been mostly studied in static graph settings while efforts for modeling dynamic graphs are still scant. In this paper, we develop a novel hierarchical variational model that introduces additional latent random variables to jointly model the hidden states of a graph recurrent neural network (GRNN) to capture both topology and node attribute changes in dynamic graphs. We argue that the use of high-level latent random variables in this variational GRNN (VGRNN) can better capture potential variability observed in dynamic graphs as well as the uncertainty of node latent representation. With semi-implicit variational inference developed for this new VGRNN architecture (SI-VGRNN), we show that flexible non-Gaussian latent representations can further help dynamic graph analytic tasks. Our experiments with multiple real-world dynamic graph datasets demonstrate that SI-VGRNN and VGRNN consistently outperform the existing baseline and state-of-the-art methods by a significant margin in dynamic link prediction. 
    more » « less