skip to main content

This content will become publicly available on July 14, 2023

Title: Learning and Inference in Sparse Coding Models With Langevin Dynamics
Abstract We describe a stochastic, dynamical system capable of inference and learning in a probabilistic latent variable model. The most challenging problem in such models—sampling the posterior distribution over latent variables—is proposed to be solved by harnessing natural sources of stochasticity inherent in electronic and neural systems. We demonstrate this idea for a sparse coding model by deriving a continuous-time equation for inferring its latent variables via Langevin dynamics. The model parameters are learned by simultaneously evolving according to another continuous-time equation, thus bypassing the need for digital accumulators or a global clock. Moreover, we show that Langevin dynamics lead to an efficient procedure for sampling from the posterior distribution in the L0 sparse regime, where latent variables are encouraged to be set to zero as opposed to having a small L1 norm. This allows the model to properly incorporate the notion of sparsity rather than having to resort to a relaxed version of sparsity to make optimization tractable. Simulations of the proposed dynamical system on both synthetic and natural image data sets demonstrate that the model is capable of probabilistically correct inference, enabling learning of the dictionary as well as parameters of the prior.
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Neural Computation
Page Range or eLocation-ID:
1676 to 1700
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized non-persistent short run MCMC, such as nite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model.
  2. Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into the prior, an issue known as posterior collapse. Various techniques have been proposed to mitigate this issue. Most of them focus on improving the inference model to yield latent codes of higher quality. The present work proposes a short run dynamics for inference. It is initialized from the prior distribution of the latent variable and then runs a small number (e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The major advantage of our method is that it does not require a separate inference model or assume simple geometry of the posterior distribution, thus rendering an automatic, natural and flexible inference engine. We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse. Analyses of the latent space show that interpolationmore »in the latent space is able to generate coherent sentences with smooth transition and demonstrate improved classification over strong baselines with latent features from unsupervised pretraining. These results together expose a well-structured latent space of our generative model.« less
  3. Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when themore »available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.« less
  4. Nonlinear state-space models are powerful tools to describe dynamical structures in complex time series. In a streaming setting where data are processed one sample at a time, simultaneous inference of the state and its nonlinear dynamics has posed significant challenges in practice. We develop a novel online learning framework, leveraging variational inference and sequential Monte Carlo, which enables flexible and accurate Bayesian joint filtering. Our method provides an approximation of the filtering posterior which can be made arbitrarily close to the true filtering distribution for a wide class of dynamics models and observation models. Specifically, the proposed framework can efficiently approximate a posterior over the dynamics using sparse Gaussian processes, allowing for an interpretable model of the latent dynamics. Constant time complexity per sample makes our approach amenable to online learning scenarios and suitable for real-time applications.
  5. Small integration time steps limit molecular dynamics (MD) simulations to millisecond time scales. Markov state models (MSMs) and equation-free approaches learn low-dimensional kinetic models from MD simulation data by performing configurational or dynamical coarse-graining of the state space. The learned kinetic models enable the efficient generation of dynamical trajectories over vastly longer time scales than are accessible by MD, but the discretization of configurational space and/or absence of a means to reconstruct molecular configurations precludes the generation of continuous all-atom molecular trajectories. We propose latent space simulators (LSS) to learn kinetic models for continuous all-atom simulation trajectories by training three deep learning networks to (i) learn the slow collective variables of the molecular system, (ii) propagate the system dynamics within this slow latent space, and (iii) generatively reconstruct molecular configurations. We demonstrate the approach in an application to Trp-cage miniprotein to produce novel ultra-long synthetic folding trajectories that accurately reproduce all-atom molecular structure, thermodynamics, and kinetics at six orders of magnitude lower cost than MD. The dramatically lower cost of trajectory generation enables greatly improved sampling and greatly reduced statistical uncertainties in estimated thermodynamic averages and kinetic rates.