- Award ID(s):
- 1935555
- PAR ID:
- 10396642
- Date Published:
- Journal Name:
- The 39th International Conference on Machine Learning,
- Volume:
- 162
- Page Range / eLocation ID:
- 23857-23896
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational–statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.more » « less
-
ABSTRACT Type Ia supernovae (SNe Ia) are standarizable candles whose observed light curves can be used to infer their distances, which can in turn be used in cosmological analyses. As the quantity of observed SNe Ia grows with current and upcoming surveys, increasingly scalable analyses are necessary to take full advantage of these new data sets for precise estimation of cosmological parameters. Bayesian inference methods enable fitting SN Ia light curves with robust uncertainty quantification, but traditional posterior sampling using Markov Chain Monte Carlo (MCMC) is computationally expensive. We present an implementation of variational inference (VI) to accelerate the fitting of SN Ia light curves using the BayeSN hierarchical Bayesian model for time-varying SN Ia spectral energy distributions. We demonstrate and evaluate its performance on both simulated light curves and data from the Foundation Supernova Survey with two different forms of surrogate posterior–a multivariate normal and a custom multivariate zero-lower-truncated normal distribution–and compare them with the Laplace Approximation and full MCMC analysis. To validate of our variational approximation, we calculate the Pareto-smoothed importance sampling diagnostic, and perform variational simulation-based calibration. The VI approximation achieves similar results to MCMC but with an order-of-magnitude speed-up for the inference of the photometric distance moduli. Overall, we show that VI is a promising method for scalable parameter inference that enables analysis of larger data sets for precision cosmology.
-
Pierre Alquier (Ed.)A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained. The theory is further illustrated using several numerical studies.more » « less
-
We propose a stochastic variational inference algorithm for training large-scale Bayesian networks, where noisy-OR conditional distributions are used to capture higher-order relationships. One application is to the learning of hierarchical topic models for text data. While previous work has focused on two-layer networks popular in applications like medical diagnosis, we develop scalable algorithms for deep networks that capture a multi-level hierarchy of interactions. Our key innovation is a family of constrained variational bounds that only explicitly optimize posterior probabilities for the sub-graph of topics most related to the sparse observations in a given document. These constrained bounds have comparable accuracy but dramatically reduced computational cost. Using stochastic gradient updates based on our variational bounds, we learn noisy-OR Bayesian networks orders of magnitude faster than was possible with prior Monte Carlo learning algorithms, and provide a new tool for understanding large-scale binary data.more » « less
-
Nonlinear state-space models are powerful tools to describe dynamical structures in complex time series. In a streaming setting where data are processed one sample at a time, simultaneous inference of the state and its nonlinear dynamics has posed significant challenges in practice. We develop a novel online learning framework, leveraging variational inference and sequential Monte Carlo, which enables flexible and accurate Bayesian joint filtering. Our method provides an approximation of the filtering posterior which can be made arbitrarily close to the true filtering distribution for a wide class of dynamics models and observation models. Specifically, the proposed framework can efficiently approximate a posterior over the dynamics using sparse Gaussian processes, allowing for an interpretable model of the latent dynamics. Constant time complexity per sample makes our approach amenable to online learning scenarios and suitable for real-time applications.more » « less