It is standard statistical practice to provide measures of uncertainty around parameter estimates. Unfortunately, this very basic and necessary enterprise is often absent in macroevolutionary studies using maximum likelihood estimates (MLEs). dentist is an R package that allows an approximation of confidence intervals (CI) around parameter estimates without an analytic solution to likelihood equations. This package works by ‘denting’ the likelihood surface by sampling points a specified distance around the MLE following what is essentially a Metropolis‐Hastings walk. We describe the importance of estimating uncertainty around parameter estimates, as well as demonstrate the ability of dentist to accurately approximate CI. We introduce several plotting tools to visualize the results of a dentist analysis. dentist is freely available from
On Importance Sampling-Based Evaluation of Latent Language Models
Language models that use additional latent structures (e.g., syntax trees, coreference chains, knowledge graph links) provide several advantages over traditional language models. However, likelihood-based evaluation of these models is often intractable as it requires marginalizing over the latent space. Existing works avoid this issue by using importance sampling. Although this approach has asymptotic guarantees, analysis is rarely conducted on the effect of decisions such as sample size and choice of proposal distribution on the reported estimates. In this paper, we carry out this analysis for three models: RNNG, EntityNLM, and KGLM. In addition, we elucidate subtle differences in how importance sampling is applied in these works that can have substantial effects on the final estimates, as well as provide theoretical results which reinforce the validity of this technique.
more »
« less
- Award ID(s):
- 1817183
- PAR ID:
- 10180483
- Date Published:
- Journal Name:
- Annual Meeting of the Association for Computational Linguistics (ACL)
- Page Range / eLocation ID:
- 2171 to 2176
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract https://github.com/bomeara/dentist , written in the R language, and can be used for any given likelihood function. -
Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.more » « less
-
Meila, Marina ; Zhang, Tong (Ed.)Black-box variational inference algorithms use stochastic sampling to analyze diverse statistical models, like those expressed in probabilistic programming languages, without model-specific derivations. While the popular score-function estimator computes unbiased gradient estimates, its variance is often unacceptably large, especially in models with discrete latent variables. We propose a stochastic natural gradient estimator that is as broadly applicable and unbiased, but improves efficiency by exploiting the curvature of the variational bound, and provably reduces variance by marginalizing discrete latent variables. Our marginalized stochastic natural gradients have intriguing connections to classic coordinate ascent variational inference, but allow parallel updates of variational parameters, and provide superior convergence guarantees relative to naive Monte Carlo approximations. We integrate our method with the probabilistic programming language Pyro and evaluate real-world models of documents, images, networks, and crowd-sourcing. Compared to score-function estimators, we require far fewer Monte Carlo samples and consistently convergence orders of magnitude faster.more » « less
-
null (Ed.)The goal of item response theoretic (IRT) models is to provide estimates of latent traits from binary observed indicators and at the same time to learn the item response funcitons (IRFs) that map from latent trait to observed response. However, in many cases observed behavior can deviate significantly from the parametric assumptions of traditional IRT models. Nonparametric IRT (NIRT) models overcome these challenges by relaxing assumptions about the form of the IRFs, but standard tools are unable to simultaneously estimate flexible IRFs and recover ability estimates for respondents. We propose a Bayesian nonparametric model that solves this problem by placing Gaussian process priors on the latent functions defining the IRFs. This allows us to simultaneously relax assumptions about the shape of the IRFs while preserving the ability to estimate latent traits. This in turn allows us to easily extend the model to further tasks such as active learning. GPIRT therefore provides a simple and intuitive solution to several longstanding problems in the IRT literature.more » « less
-
Inherent vulnerabilities in a cyber network’s constituent machine services can be exploited by malicious agents. As a result, the machines on any network are at risk. Security specialists seek to mitigate the risk of intrusion events through network reconfiguration and defense. When dealing with rare cyber events, high-quality risk estimates using standard simulation approaches may be unattainable, or have significant attached uncertainty, even with a large computational simulation budget. To address this issue, an efficient rare event simulation modeling and analysis technique, namely, importance sampling for cyber networks, is developed. The importance sampling method parametrically amplifies certain aspects of the network in order to cause a rare event to happen more frequently. Output collected under these amplified conditions is then scaled back into the context of the original network to provide meaningful statistical inferences. The importance sampling methodology is tailored to cyber network attacks and takes the attacker’s successes and failures as well as the attacker’s targeting choices into account. The methodology is shown to produce estimates of higher quality than standard simulation with greater computational efficiency.more » « less