skip to main content

Title: Learning energy-based models by diffusion recovery likelihood
While energy-based models (EBMs) exhibit a number of desirable properties, training and sampling on high-dimensional datasets remains challenging. Inspired by recent progress on diffusion probabilistic models, we present a diffusion re- covery likelihood method to tractably learn and sample from a sequence of EBMs trained on increasingly noisy versions of a dataset. Each EBM is trained with recovery likelihood, which maximizes the conditional probability of the data at a certain noise level given their noisy versions at a higher noise level. Optimizing re- covery likelihood is more tractable than marginal likelihood, as sampling from the conditional distributions is much easier than sampling from the marginal distribu- tions. After training, synthesized images can be generated by the sampling process that initializes from Gaussian white noise distribution and progressively samples the conditional distributions at decreasingly lower noise levels. Our method gener- ates high fidelity samples on various image datasets. On unconditional CIFAR-10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs. Moreover, we demonstrate that unlike previous work on EBMs, our long-run MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data more » even for high-dimensional datasets. Our implementation is avail- able at « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
International Conference on Learning Representations (ICLR 2021)
Sponsoring Org:
National Science Foundation
More Like this
  1. Computer-aided diagnosis (CAD) systems must constantly cope with the perpetual changes in data distribution caused by different sensing technologies, imaging protocols, and patient populations. Adapting these systems to new domains often requires significant amounts of labeled data for re-training. This process is labor-intensive and time-consuming. We propose a memory-augmented capsule network for the rapid adaptation of CAD models to new domains. It consists of a capsule network that is meant to extract feature embeddings from some high-dimensional input, and a memory-augmented task network meant to exploit its stored knowledge from the target domains. Our network is able to efficiently adapt to unseen domains using only a few annotated samples. We evaluate our method using a large-scale public lung nodule dataset (LUNA), coupled with our own collected lung nodules and incidental lung nodules datasets. When trained on the LUNA dataset, our network requires only 30 additional samples from our collected lung nodule and incidental lung nodule datasets to achieve clinically relevant performance (0.925 and 0.891 area under receiving operating characteristic curves (AUROC), respectively). This result is equivalent to using two orders of magnitude less labeled training data while achieving the same performance. We further evaluate our method by introducing heavy noise,more »artifacts, and adversarial attacks. Under these severe conditions, our network’s AUROC remains above 0.7 while the performance of state-of-the-art approaches reduce to chance level« less
  2. Yap, Pew-Thian (Ed.)
    Diffusion weighted imaging (DWI) with multiple, high b-values is critical for extracting tissue microstructure measurements; however, high b-value DWI images contain high noise levels that can overwhelm the signal of interest and bias microstructural measurements. Here, we propose a simple denoising method that can be applied to any dataset, provided a low-noise, single-subject dataset is acquired using the same DWI sequence. The denoising method uses a one-dimensional convolutional neural network (1D-CNN) and deep learning to learn from a low-noise dataset, voxel-by-voxel. The trained model can then be applied to high-noise datasets from other subjects. We validated the 1D-CNN denoising method by first demonstrating that 1D-CNN denoising resulted in DWI images that were more similar to the noise-free ground truth than comparable denoising methods, e.g., MP-PCA, using simulated DWI data. Using the same DWI acquisition but reconstructed with two common reconstruction methods, i.e. SENSE1 and sum-of-square, to generate a pair of low-noise and high-noise datasets, we then demonstrated that 1D-CNN denoising of high-noise DWI data collected from human subjects showed promising results in three domains: DWI images, diffusion metrics, and tractography. In particular, the denoised images were very similar to a low-noise reference image of that subject, more than the similaritymore »between repeated low-noise images (i.e. computational reproducibility). Finally, we demonstrated the use of the 1D-CNN method in two practical examples to reduce noise from parallel imaging and simultaneous multi-slice acquisition. We conclude that the 1D-CNN denoising method is a simple, effective denoising method for DWI images that overcomes some of the limitations of current state-of-the-art denoising methods, such as the need for a large number of training subjects and the need to account for the rectified noise floor.« less
  3. Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation. OT, however, is very sensitive to outliers (samples with large noise) in the data since in its objective function, every sample, including outliers, is weighed similarly due to the marginal constraints. To remedy this issue, robust formulations of OT with unbalanced marginal constraints have previously been proposed. However, employing these methods in deep learning problems such as GANs and domain adaptation is challenging due to the instability of their dual optimization solvers. In this paper, we resolve these issues by deriving a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications. We demonstrate the effectiveness of our formulation in two applications of GANs and domain adaptation. Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions. In particular, our optimization computes weights for training samples reflecting how difficult it is for those samples to be generated in the model. In domain adaptation, our robust OT formulation leads to improved accuracy compared to the standard adversarial adaptation methods.
  4. Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.
  5. Deep neural networks have provided state-of-the-art solutions for problems such as image denoising, which implicitly rely on a prior probability model of natural images. Two recent lines of work – Denoising Score Matching and Plug-and-Play – propose methodologies for drawing samples from this implicit prior and using it to solve inverse problems, respectively. Here, we develop a parsimonious and robust generalization of these ideas. We rely on a classic statistical result that shows the least-squares solution for removing additive Gaussian noise can be written directly in terms of the gradient of the log of the noisy signal density. We use this to derive a stochastic coarse-to-fine gradient ascent procedure for drawing high-probability samples from the implicit prior embedded within a CNN trained to perform blind denoising. A generalization of this algorithm to constrained sampling provides a method for using the implicit prior to solve any deterministic linear inverse problem, with no additional training, thus extending the power of supervised learning for denoising to a much broader set of problems. The algorithm relies on minimal assumptions and exhibits robust convergence over a wide range of parameter choices. To demonstrate the generality of our method, we use it to obtain state-of-the-art levelsmore »of unsupervised performance for deblurring, super-resolution, and compressive sensing.« less