skip to main content

Title: Conjugate Energy-Based Models
In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping from data to latent variables. However, these models omit a generator network, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that conjugate EBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-domain detection on a variety of datasets.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Meila, M.; Zhang, T.
Date Published:
Journal Name:
Proceedings of the 38th International Conference on Machine Learning
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. Such joint latent space EBM prior model captures the intra-layer contextual relations at each layer through layer-wise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating high-quality images and capturing hierarchical features for better outlier detection. 
    more » « less
  2. This paper studies the fundamental problem of multi-layer generator models in learning hierarchical representations. The multi-layer generator model that consists of multiple layers of latent variables organized in a top-down architecture tends to learn multiple levels of data abstraction. However, such multi-layer latent variables are typically parameterized to be Gaussian, which can be less informative in capturing complex abstractions, resulting in limited success in hierarchical representation learning. On the other hand, the energy-based (EBM) prior is known to be expressive in capturing the data regularities, but it often lacks the hierarchical structure to capture different levels of hierarchical representations. In this paper, we propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning. We develop a variational joint learning scheme that seamlessly integrates an inference model for efficient inference. Our experiments demonstrate that the proposed joint EBM prior is effective and expressive in capturing hierarchical representations and modeling data distribution. 
    more » « less
  3. Abstract

    Joint modeling of spatially oriented dependent variables is commonplace in the environmental sciences, where scientists seek to estimate the relationships among a set of environmental outcomes accounting for dependence among these outcomes and the spatial dependence for each outcome. Such modeling is now sought for massive data sets with variables measured at a very large number of locations. Bayesian inference, while attractive for accommodating uncertainties through hierarchical structures, can become computationally onerous for modeling massive spatial data sets because of its reliance on iterative estimation algorithms. This article develops a conjugate Bayesian framework for analyzing multivariate spatial data using analytically tractable posterior distributions that obviate iterative algorithms. We discuss differences between modeling the multivariate response itself as a spatial process and that of modeling a latent process in a hierarchical model. We illustrate the computational and inferential benefits of these models using simulation studies and analysis of a vegetation index data set with spatially dependent observations numbering in the millions.

    more » « less
  4. Abstract The process of evapotranspiration transfers liquid water from vegetation and soil surfaces to the atmosphere, the so-called latent heat flux ( Q LE ), and modulates the Earth’s energy, water, and carbon cycle. Vegetation controls Q LE by regulating leaf stomata opening (surface resistance r s in the Big Leaf approach) and by altering surface roughness (aerodynamic resistance r a ). Estimating r s and r a across different vegetation types is a key challenge in predicting Q LE . We propose a hybrid approach that combines mechanistic modeling and machine learning for modeling Q LE . The hybrid model combines a feed-forward neural network which estimates the resistances from observations as intermediate variables and a mechanistic model in an end-to-end setting. In the hybrid modeling setup, we make use of the Penman–Monteith equation in conjunction with multi-year flux measurements across different forest and grassland sites from the FLUXNET database. This hybrid model setup is successful in predicting Q LE , however, this approach leads to equifinal solutions in terms of estimated physical parameters. We follow two different strategies to constrain the hybrid model and therefore control for the equifinality that arises when the two resistances are estimated simultaneously. One strategy is to impose an a priori constraint on r a based on mechanistic assumptions (theory-driven strategy), while the other strategy makes use of more observational data and adds a constraint in predicting r a through multi-task learning of both latent and sensible heat flux ( Q H ; data-driven strategy) together. Our results show that all hybrid models predict the target variables with a high degree of success, with R 2 = 0.82–0.89 for grasslands and R 2 = 0.70–0.80 for forest sites at the mean diurnal scale. The predicted r s and r a show strong physical consistency across the two regularized hybrid models, but are physically implausible in the under-constrained hybrid model. The hybrid models are robust in reproducing consistent results for energy fluxes and resistances across different scales (diurnal, seasonal, and interannual), reflecting their ability to learn the physical dependence of the target variables on the meteorological inputs. As a next step, we propose to test these heavily observation-informed parameterizations derived through hybrid modeling as a substitute for ad hoc formulations in Earth system models. 
    more » « less
  5. Marginalization of latent variables or nuisance parameters is a fundamental aspect of Bayesian inference and uncertainty quantification. In this work, we focus on scalable marginalization of latent variables in modeling correlated data, such as spatio-temporal or functional observations. We first introduce Gaussian processes (GPs) for modeling correlated data and highlight the computational challenge, where the computational complexity increases cubically fast along with the number of observations. We then review the connection between the state space model and GPs with Matérn covariance for temporal inputs. The Kalman filter and Rauch-Tung-Striebel smoother were introduced as a scalable marginalization technique for computing the likelihood and making predictions of GPs without approximation. We introduce recent efforts on extending the scalable marginalization idea to the linear model of coregionalization for multivariate correlated output and spatio-temporal observations. In the final part of this work, we introduce a novel marginalization technique to estimate interaction kernels and forecast particle trajectories. The computational progress lies in the sparse representation of the inverse covariance matrix of the latent variables, then applying conjugate gradient for improving predictive accuracy with large data sets. The computational advances achieved in this work outline a wide range of applications in molecular dynamic simulation, cellular migration, and agent-based models. 
    more » « less