skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On Importance Sampling-Based Evaluation of Latent Language Models
Language models that use additional latent structures (e.g., syntax trees, coreference chains, knowledge graph links) provide several advantages over traditional language models. However, likelihood-based evaluation of these models is often intractable as it requires marginalizing over the latent space. Existing works avoid this issue by using importance sampling. Although this approach has asymptotic guarantees, analysis is rarely conducted on the effect of decisions such as sample size and choice of proposal distribution on the reported estimates. In this paper, we carry out this analysis for three models: RNNG, EntityNLM, and KGLM. In addition, we elucidate subtle differences in how importance sampling is applied in these works that can have substantial effects on the final estimates, as well as provide theoretical results which reinforce the validity of this technique.  more » « less
Award ID(s):
1817183
PAR ID:
10180483
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Annual Meeting of the Association for Computational Linguistics (ACL)
Page Range / eLocation ID:
2171 to 2176
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Meila, Marina; Zhang, Tong (Ed.)
    Black-box variational inference algorithms use stochastic sampling to analyze diverse statistical models, like those expressed in probabilistic programming languages, without model-specific derivations. While the popular score-function estimator computes unbiased gradient estimates, its variance is often unacceptably large, especially in models with discrete latent variables. We propose a stochastic natural gradient estimator that is as broadly applicable and unbiased, but improves efficiency by exploiting the curvature of the variational bound, and provably reduces variance by marginalizing discrete latent variables. Our marginalized stochastic natural gradients have intriguing connections to classic coordinate ascent variational inference, but allow parallel updates of variational parameters, and provide superior convergence guarantees relative to naive Monte Carlo approximations. We integrate our method with the probabilistic programming language Pyro and evaluate real-world models of documents, images, networks, and crowd-sourcing. Compared to score-function estimators, we require far fewer Monte Carlo samples and consistently convergence orders of magnitude faster. 
    more » « less
  2. Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts. 
    more » « less
  3. Random parameter logit models address unobserved preference heterogeneity in discrete choice analysis. The latent class logit model assumes a discrete heterogeneity distribution, by combining a conditional logit model of economic choices with a multinomial logit (MNL) for stochastic assignment to classes. Whereas point estimation of latent class logit models is widely applied in practice, stochastic assignment of individuals to classes needs further analysis. In this paper we analyze the statistical behavior of six competing class assignment strategies, namely: maximum prior MNL probabilities, class drawn from prior MNL probabilities, maximum posterior assignment, drawn posterior assignment, conditional individual-specific estimates, and conditional individual estimates combined with the Krinsky–Robb method to account for uncertainty. Using both a Monte Carlo study and two empirical case studies, we show that assigning individuals to classes based on maximum MNL probabilities behaves better than randomly drawn classes in market share predictions. However, randomly drawn classes have higher accuracy in predicted class shares. Finally, class assignment based on individual-level conditional estimates that account for the sampling distribution of the assignment parameters shows superior behavior for a larger number of choice occasions per individual. 
    more » « less
  4. Inherent vulnerabilities in a cyber network’s constituent machine services can be exploited by malicious agents. As a result, the machines on any network are at risk. Security specialists seek to mitigate the risk of intrusion events through network reconfiguration and defense. When dealing with rare cyber events, high-quality risk estimates using standard simulation approaches may be unattainable, or have significant attached uncertainty, even with a large computational simulation budget. To address this issue, an efficient rare event simulation modeling and analysis technique, namely, importance sampling for cyber networks, is developed. The importance sampling method parametrically amplifies certain aspects of the network in order to cause a rare event to happen more frequently. Output collected under these amplified conditions is then scaled back into the context of the original network to provide meaningful statistical inferences. The importance sampling methodology is tailored to cyber network attacks and takes the attacker’s successes and failures as well as the attacker’s targeting choices into account. The methodology is shown to produce estimates of higher quality than standard simulation with greater computational efficiency. 
    more » « less
  5. Abstract Area-level models for small area estimation typically rely on areal random effects to shrink design-based direct estimates towards a model-based predictor. Incorporating the spatial dependence of the random effects into these models can further improve the estimates when there are not enough covariates to fully account for the spatial dependence of the areal means. A number of recent works have investigated models that include random effects for only a subset of areas, in order to improve the precision of estimates. However, such models do not readily handle spatial dependence. In this paper, we introduce a model that accounts for spatial dependence in both the random effects as well as the latent process that selects the effects. We show how this model can significantly improve predictive accuracy via an empirical simulation study based on data from the American Community Survey, and illustrate its properties via an application to estimate county-level median rent burden. 
    more » « less