skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: Bayesian model selection via mean-field variational approximation

This article considers Bayesian model selection via mean-field (MF) variational approximation. Towards this goal, we study the non-asymptotic properties of MF inference that allows latent variables and model misspecification. Concretely, we show a Bernstein–von Mises (BvM) theorem for the variational distribution from MF under possible model misspecification, which implies the distributional convergence of MF variational approximation to a normal distribution centring at the maximal likelihood estimator. Motivated by the BvM theorem, we propose a model selection criterion using the evidence lower bound (ELBO), and demonstrate that the model selected by ELBO tends to asymptotically agree with the one selected by the commonly used Bayesian information criterion (BIC) as the sample size tends to infinity. Compared to BIC, ELBO tends to incur smaller approximation error to the log-marginal likelihood (a.k.a. model evidence) due to a better dimension dependence and full incorporation of the prior information. Moreover, we show the geometric convergence of the coordinate ascent variational inference algorithm, which provides a practical guidance on how many iterations one typically needs to run when approximating the ELBO. These findings demonstrate that variational inference is capable of providing a computationally efficient alternative to conventional approaches in tasks beyond obtaining point estimates.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Model selection for marginal regression analysis of longitudinal data is challenging owing to the presence of correlation and the difficulty of specifying the full likelihood, particularly for correlated categorical data. The paper introduces a novel Bayesian information criterion type model selection procedure based on the quadratic inference function, which does not require the full likelihood or quasi-likelihood. With probability approaching 1, the criterion selects the most parsimonious correct model. Although a working correlation matrix is assumed, there is no need to estimate the nuisance parameters in the working correlation matrix; moreover, the model selection procedure is robust against the misspecification of the working correlation matrix. The criterion proposed can also be used to construct a data-driven Neyman smooth test for checking the goodness of fit of a postulated model. This test is especially useful and often yields much higher power in situations where the classical directional test behaves poorly. The finite sample performance of the model selection and model checking procedures is demonstrated through Monte Carlo studies and analysis of a clinical trial data set.

    more » « less
  2. Abstract

    We propose a model-based clustering method for high-dimensional longitudinal data via regularization in this paper. This study was motivated by the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multilevel factors related to the change of physical activity by following up a cohort of 783 girls over 10 years from adolescence to early adulthood. Our goal is to identify the intrinsic grouping of subjects with similar patterns of physical activity trajectories and the most relevant predictors within each group. The previous analyses conducted clustering and variable selection in two steps, while our new method can perform the tasks simultaneously. Within each cluster, a linear mixed-effects model (LMM) is fitted with a doubly penalized likelihood to induce sparsity for parameter estimation and effect selection. The large-sample joint properties are established, allowing the dimensions of both fixed and random effects to increase at an exponential rate of the sample size, with a general class of penalty functions. Assuming subjects are drawn from a Gaussian mixture distribution, model effects and cluster labels are estimated via a coordinate descent algorithm nested inside the Expectation-Maximization (EM) algorithm. Bayesian Information Criterion (BIC) is used to determine the optimal number of clusters and the values of tuning parameters. Our numerical studies show that the new method has satisfactory performance and is able to accommodate complex data with multilevel and/or longitudinal effects.

    more » « less
  3. Abstract

    The classical multiple testing model remains an important practical area of statistics with new approaches still being developed. In this paper we develop a new multiple testing procedure inspired by a method sometimes used in a problem with a different focus. Namely, the inference after model selection problem. We note that solutions to that problem are often accomplished by making use of a penalized likelihood function. A classic example is the Bayesian information criterion (BIC) method. In this paper we construct a generalized BIC method and evaluate its properties as a multiple testing procedure. The procedure is applicable to a wide variety of statistical models including regression, contrasts, treatment versus control, change point, and others. Numerical work indicates that, in particular, for sparse models the new generalized BIC would be preferred over existing multiple testing procedures.

    more » « less
  4. Summary

    We obtain the residual information criterion RIC, a selection criterion based on the residual log-likelihood, for regression models including classical regression models, Box–Cox transformation models, weighted regression models and regression models with autoregressive moving average errors. We show that RIC is a consistent criterion, and that simulation studies for each of the four models indicate that RIC provides better model order choices than the Akaike information criterion, corrected Akaike information criterion, final prediction error, Cp and Radj2, except when the sample size is small and the signal-to-noise ratio is weak. In this case, none of the criteria performs well. Monte Carlo results also show that RIC is superior to the consistent Bayesian information criterion BIC when the signal-to-noise ratio is not weak, and it is comparable with BIC when the signal-to-noise ratio is weak and the sample size is large.

    more » « less
  5. Pierre Alquier (Ed.)
    A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained. The theory is further illustrated using several numerical studies. 
    more » « less