Latent space EnergyBased Models (EBMs), also known as energybased priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energybased model. We develop a geometric clusteringbased regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.
more »
« less
MCMC should mix: learning energybased model with neural transport latent space MCMC.
Learning energybased model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in highdimensional data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multimodal in the data space. This is a serious handicap for both theory and practice of EBMs. In this paper, we propose to learn EBM with a flowbased model (or in general latent variable model) serving as a backbone, so that the EBM is a correction or an exponential tilting of the flowbased model. We show that the model has a particularly simple form in the space of the latent variables of the generative model, and MCMC sampling of the EBM in the latent space mixes well and traverses modes in the data space. This enables proper sampling and learning of EBMs.
more »
« less
 Award ID(s):
 2015577
 NSFPAR ID:
 10351392
 Date Published:
 Journal Name:
 International Conference on Learning Representations (ICLR 2022).
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


This paper studies the fundamental problem of learning multilayer generator models. The multilayer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling interlayer relations between latent variables by assuming noninformative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energybased model (EBM) on the joint latent space over all layers of latent variables with the multilayer generator as its backbone. Such joint latent space EBM prior model captures the intralayer contextual relations at each layer through layerwise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating highquality images and capturing hierarchical features for better outlier detection.more » « less

We propose to learn energybased model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the topdown networkofthegeneratormodel. BoththelatentspaceEBMandthetopdown network can be learned jointly by maximum likelihood, which involves shortrun MCMC sampling from both the prior and posterior distributions of the latent vector. Due to the low dimensionality of the latent space and the expressiveness of the topdown network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well. We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection. The onepage code can be found in supplementary materials.more » « less

While energybased models (EBMs) exhibit a number of desirable properties, training and sampling on highdimensional datasets remains challenging. Inspired by recent progress on diffusion probabilistic models, we present a diffusion re covery likelihood method to tractably learn and sample from a sequence of EBMs trained on increasingly noisy versions of a dataset. Each EBM is trained with recovery likelihood, which maximizes the conditional probability of the data at a certain noise level given their noisy versions at a higher noise level. Optimizing re covery likelihood is more tractable than marginal likelihood, as sampling from the conditional distributions is much easier than sampling from the marginal distribu tions. After training, synthesized images can be generated by the sampling process that initializes from Gaussian white noise distribution and progressively samples the conditional distributions at decreasingly lower noise levels. Our method gener ates high fidelity samples on various image datasets. On unconditional CIFAR10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs. Moreover, we demonstrate that unlike previous work on EBMs, our longrun MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data even for highdimensional datasets. Our implementation is avail able at https://github.com/ruiqigao/recovery_likelihood.more » « less

This paper studies the unsupervised crossdomain translation problem by proposing a generative framework, in which the probability distribution of each domain is represented by a generative cooperative network that consists of an energy based model and a latent variable model. The use of generative cooperative network enables maximum likelihood learning of the domain model by MCMC teaching, where the energybased model seeks to fit the data distribution of domain and distills its knowledge to the latent variable model via MCMC. Specifically, in the MCMC teaching process, the latent variable model parameterized by an encoderdecoder maps examples from the source domain to the target domain, while the energybased model further refines the mapped results by Langevin revision such that the revised results match to the examples in the target domain in terms of the statistical properties, which are defined by the learned energy function. For the purpose of building up a correspondence between two unpaired domains, the proposed framework simultaneously learns a pair of cooperative networks with cycle consistency, accounting for a twoway translation between two domains, by alternating MCMC teaching. Experiments show that the proposed framework is useful for unsupervised imagetoimage translation and unpaired image sequence translation.more » « less