skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Simulation-Based Inference with Approximately Correct Parameters via Maximum Entropy
Inferring the input parameters of simulators from observations is a crucial challenge with applications from epidemiology to molecular dynamics. Here we show a simple approach in the regime of sparse data and approximately correct models, which is common when trying to use an existing model to infer latent variables with observed data. This approach is based on the principle of maximum entropy (MaxEnt) and provably makes the smallest change in the latent joint distribution to fit new data. This method requires no likelihood or model derivatives and its fit is insensitive to prior strength, removing the need to balance observed data fit with prior belief. The method requires the ansatz that data is fit in expectation, which is true in some settings and may be reasonable in all with few data points. The method is based on sample reweighting, so its asymptotic run time is independent of prior distribution dimension. We demonstrate this MaxEnt approach and compare with other likelihood-free inference methods across three systems: a point particle moving in a gravitational field, a compartmental model of epidemic spread and finally molecular dynamics simulation of a protein.  more » « less
Award ID(s):
2029095
PAR ID:
10289805
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ArXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into the prior, an issue known as posterior collapse. Various techniques have been proposed to mitigate this issue. Most of them focus on improving the inference model to yield latent codes of higher quality. The present work proposes a short run dynamics for inference. It is initialized from the prior distribution of the latent variable and then runs a small number (e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The major advantage of our method is that it does not require a separate inference model or assume simple geometry of the posterior distribution, thus rendering an automatic, natural and flexible inference engine. We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse. Analyses of the latent space show that interpolation in the latent space is able to generate coherent sentences with smooth transition and demonstrate improved classification over strong baselines with latent features from unsupervised pretraining. These results together expose a well-structured latent space of our generative model. 
    more » « less
  2. This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized non-persistent short run MCMC, such as nite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model. 
    more » « less
  3. Abstract Stochastic epidemic models (SEMs) fit to incidence data are critical to elucidating outbreak dynamics, shaping response strategies, and preparing for future epidemics. SEMs typically represent counts of individuals in discrete infection states using Markov jump processes (MJPs), but are computationally challenging as imperfect surveillance, lack of subject‐level information, and temporal coarseness of the data obscure the true epidemic. Analytic integration over the latent epidemic process is impossible, and integration via Markov chain Monte Carlo (MCMC) is cumbersome due to the dimensionality and discreteness of the latent state space. Simulation‐based computational approaches can address the intractability of the MJP likelihood, but are numerically fragile and prohibitively expensive for complex models. A linear noise approximation (LNA) that approximates the MJP transition density with a Gaussian density has been explored for analyzing prevalence data in large‐population settings, but requires modification for analyzing incidence counts without assuming that the data are normally distributed. We demonstrate how to reparameterize SEMs to appropriately analyze incidence data, and fold the LNA into a data augmentation MCMC framework that outperforms deterministic methods, statistically, and simulation‐based methods, computationally. Our framework is computationally robust when the model dynamics are complex and applies to a broad class of SEMs. We evaluate our method in simulations that reflect Ebola, influenza, and SARS‐CoV‐2 dynamics, and apply our method to national surveillance counts from the 2013–2015 West Africa Ebola outbreak. 
    more » « less
  4. We propose a data-driven learning framework for the analytic continuation problem in numerical quantum many-body physics. Designing an accurate and efficient framework for the analytic continuation of imaginary time using computational data is a grand challenge that has hindered meaningful links with experimental data. The standard Maximum Entropy (MaxEnt)-based method is limited by the quality of the computational data and the availability of prior information. Also, the MaxEnt is not able to solve the inversion problem under high level of noise in the data. Here we introduce a novel learning model for the analytic continuation problem using a Adams-Bashforth residual neural network (AB-ResNet). The advantage of this deep learning network is that it is model independent and, therefore, does not require prior information concerning the quantity of interest given by the spectral function. More importantly, the ResNet-based model achieves higher accuracy than MaxEnt for data with higher level of noise. Finally, numerical examples show that the developed AB-ResNet is able to recover the spectral function with accuracy comparable to MaxEnt where the noise level is relatively small. 
    more » « less
  5. This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. Such joint latent space EBM prior model captures the intra-layer contextual relations at each layer through layer-wise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating high-quality images and capturing hierarchical features for better outlier detection. 
    more » « less