Gradient-based approximate inference methods, such as Stein variational gradient descent (SVGD), provide simple and general-purpose inference engines for differentiable continuous distributions. However, existing forms of SVGD cannot be directly applied to discrete distributions. In this work, we fill this gap by proposing a simple yet general framework that transforms discrete distributions to equivalent piecewise continuous distributions, on which the gradient-free SVGD is applied to perform efficient approximate inference. The empirical results show that our method outperforms traditional algorithms such as Gibbs sampling and discontinuous Hamiltonian Monte Carlo on various challenging benchmarks of discrete graphical models. We demonstrate that our method provides a promising tool for learning ensembles of binarized neural network (BNN), outperforming other widely used ensemble methods on learning binarized AlexNet on CIFAR-10 dataset. In addition, such transform can be straightforwardly employed in gradient-free kernelized Stein discrepancy to perform goodness-of-fit (GOF) test on discrete distributions. Our proposed method outperforms existing GOF test methods for intractable discrete distributions.
more »
« less
A Stein-Papangelou Goodness-of-Fit Test for Point Processes.
Point processes provide a powerful framework for modeling the distribution and interactions of events in time or space. Their flexibility has given rise to a variety of sophisticated models in statistics and machine learning, yet model diagnostic and criticism techniques re- main underdeveloped. In this work, we pro- pose a general Stein operator for point pro- cesses based on the Papangelou conditional intensity function. We then establish a kernel goodness-of-fit test by defining a Stein dis- crepancy measure for general point processes. Notably, our test also applies to non-Poisson point processes whose intensity functions con- tain intractable normalization constants due to the presence of complex interactions among points. We apply our proposed test to sev- eral point process models, and show that it outperforms a two-sample test based on the maximum mean discrepancy.
more »
« less
- Award ID(s):
- 1816499
- PAR ID:
- 10105515
- Date Published:
- Journal Name:
- Artificial Intelligence and Statistics (AISTATS 2019)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and is illustrated with synthetic and real data examples.more » « less
-
We characterize the asymptotic performance of nonparametric goodness of fit testing. The exponential decay rate of the type-II error probability is used as the asymptotic performance metric, and a test is optimal if it achieves the maximum rate subject to a constant level constraint on the type-I error probability. We show that two classes of Maximum Mean Discrepancy (MMD) based tests attain this optimality on Rd, while the quadratictime Kernel Stein Discrepancy (KSD) based tests achieve the maximum exponential decay rate under a relaxed level constraint. Under the same performance metric, we proceed to show that the quadratic-time MMD based two-sample tests are also optimal for general two-sample problems, provided that kernels are bounded continuous and characteristic. Key to our approach are Sanov’s theorem from large deviation theory and the weak metrizable properties of the MMD and KSD.more » « less
-
Diversification has been shown to be a powerful mechanism for learning robust models in non- convex settings. A notable example is learning mixture models, in which enforcing diversity between the different mixture components allows us to prevent the model collapsing phenomenon and capture more patterns from the observed data. In this work, we present a variational approach for diversity-promoting learning, which leverages the entropy functional as a natural mechanism for enforcing diversity. We develop a simple and efficient functional gradient-based algorithm for optimizing the variational objective function, which provides a significant generalization of Stein variational gradient descent (SVGD). We test our method on various challenging real world problems, including deep embedded clustering and deep anomaly detection. Empirical results show that our method provides an effective mechanism for diversity-promoting learning, achieving substantial improvement over existing methods.more » « less
-
Abstract Social decision making involves balancing conflicts between selfishness and pro-sociality. The cognitive processes underlying such decisions are not well understood, with some arguing for a single comparison process, while others argue for dual processes (one intuitive and one deliberative). Here, we propose a way to reconcile these two opposing frameworks. We argue that behavior attributed to intuition can instead be seen as a starting point bias of a sequential sampling model (SSM) process, analogous to a prior in a Bayesian framework. Using mini-dictator games in which subjects make binary decisions about how to allocate money between themselves and another participant, we find that pro-social subjects become more pro-social under time pressure and less pro-social under time delay, while selfish subjects do the opposite. Our findings help reconcile the conflicting results concerning the cognitive processes of social decision making and highlight the importance of modeling the dynamics of the choice process.more » « less