We propose a Bayesian model selection approach for generalized linear mixed models (GLMMs). We consider covariance structures for the random effects that are widely used in areas such as longitudinal studies, genome-wide association studies, and spatial statistics. Since the random effects cannot be integrated out of GLMMs analytically, we approximate the integrated likelihood function using a pseudo likelihood approach. Our Bayesian approach assumes a flat prior for the fixed effects and includes both approximate reference prior and half-Cauchy prior choices for the variances of random effects. Since the flat prior on the fixed effects is improper, we develop a fractional Bayes factor approach to obtain posterior probabilities of the several competing models. Simulation studies with Poisson generalized linear mixed models with spatial random effects and overdispersion random effects show that our approach performs favorably when compared to widely used competing Bayesian methods including DIC and WAIC. We illustrate the usefulness and flexibility of our approach with three case studies including a Poisson longitudinal model, a Poisson spatial model, and a logistic mixed model. Our proposed approach is implemented in the R package GLMMselect that is available on CRAN.
more »
« less
Bayes Factors for Mixed Models: a Discussion
Abstract van Doorn et al. (2021) outlined various questions that arise when conducting Bayesian model comparison for mixed effects models. Seven response articles offered their own perspective on the preferred setup for mixed model comparison, on the most appropriate specification of prior distributions, and on the desirability of default recommendations. This article presents a round-table discussion that aims to clarify outstanding issues, explore common ground, and outline practical considerations for any researcher wishing to conduct a Bayesian mixed effects model comparison.
more »
« less
- Award ID(s):
- 2145308
- PAR ID:
- 10448680
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Date Published:
- Journal Name:
- Computational Brain & Behavior
- Volume:
- 6
- Issue:
- 1
- ISSN:
- 2522-0861
- Page Range / eLocation ID:
- 140 to 158
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a new classified mixed model prediction (CMMP) procedure, called pseudo-Bayesian CMMP,that uses network information in matching the group index between the training data and new data, whosecharacteristics of interest one wishes to predict. The current CMMP procedures do not incorporate suchinformation; as a result, the methods are not consistent in terms of matching the group index. Although, asthe number of training data groups increases, the current CMMP method can predict the mixed effects ofinterest consistently, its accuracy is not guaranteed when the number of groups is moderate, as is the case inmany potential applications. The proposed pseudo-Bayesian CMMP procedure assumes a flexible workingprobability model for the group index of the new observation to match the index of a training data group,which may be viewed as a pseudo prior. We show that, given any working model satisfying mild conditions,the pseudo-Bayesian CMMP procedure is consistent and asymptotically optimal both in terms of matchingthe group index and in terms of predicting the mixed effect of interest associated with the new observations.The theoretical results are fully supported by results of empirical studies, including Monte-Carlo simulationsand real-data validation.more » « less
-
Abstract Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.more » « less
-
The reliable detection of the global 21-cm signal, a key tracer of Cosmic Dawn and the Epoch of Reionization, requires meticulous data modelling and robust statistical frameworks for model validation and comparison. In Paper I of this series, we presented the beam-factor-based chromaticity correction (BFCC) model for spectrometer data processed using BFCC to suppress instrumentally induced spectral structure. We demonstrated that the BFCC model, with complexity calibrated by Bayes factor-based model comparison (BFBMC), enables unbiased recovery of a 21-cm signal consistent with the one reported by The Experiment to Detect the Global Epoch of Reionization Signature (EDGES) from simulated data. Here, we extend the evaluation of the BFCC model to lower amplitude 21-cm signal scenarios where deriving reliable conclusions about a model’s capacity to recover unbiased 21-cm signal estimates using BFBMC is more challenging. Using realistic simulations of chromaticity-corrected EDGES-low spectrometer data, we evaluate three signal amplitude regimes – null, moderate, and high. We then conduct a Bayesian comparison between the BFCC model and three alternative models previously applied to 21-cm signal estimation from EDGES data. To mitigate biases introduced by systematics in the 21-cm signal model fit, we incorporate the Bayesian Null-Test-Evidence-Ratio (BaNTER) validation framework and implement a Bayesian inference workflow based on posterior odds of the validated models. The BaNTER-validated posterior-odds-based methodology presented here is general and transferable to other global 21-cm experiments employing Bayesian signal inference. We demonstrate that, unlike BFBMC alone, this approach consistently recovers 21-cm signal estimates that align with the true signal across all amplitude regimes, advancing robust global 21-cm signal detection methodologies.more » « less
-
Abstract Background Genome-wide association studies (GWASes) aim to identify single nucleotide polymorphisms (SNPs) associated with a given phenotype. A common approach for the analysis of GWAS is single marker analysis (SMA) based on linear mixed models (LMMs). However, LMM-based SMA usually yields a large number of false discoveries and cannot be directly applied to non-Gaussian phenotypes such as count data. Results We present a novel Bayesian method to find SNPs associated with non-Gaussian phenotypes. To that end, we use generalized linear mixed models (GLMMs) and, thus, call our method Bayesian GLMMs for GWAS (BG2). To deal with the high dimensionality of GWAS analysis, we propose novel nonlocal priors specifically tailored for GLMMs. In addition, we develop related fast approximate Bayesian computations. BG2 uses a two-step procedure: first, BG2 screens for candidate SNPs; second, BG2 performs model selection that considers all screened candidate SNPs as possible regressors. A simulation study shows favorable performance of BG2 when compared to GLMM-based SMA. We illustrate the usefulness and flexibility of BG2 with three case studies on cocaine dependence (binary data), alcohol consumption (count data), and number of root-like structures in a model plant (count data).more » « less
An official website of the United States government

