Abstract We propose a Bayesian model selection approach for generalized linear mixed models (GLMMs). We consider covariance structures for the random effects that are widely used in areas such as longitudinal studies, genome-wide association studies, and spatial statistics. Since the random effects cannot be integrated out of GLMMs analytically, we approximate the integrated likelihood function using a pseudo-likelihood approach. Our Bayesian approach assumes a flat prior for the fixed effects and includes both approximate reference prior and half-Cauchy prior choices for the variances of random effects. Since the flat prior on the fixed effects is improper, we develop a fractional Bayes factor approach to obtain posterior probabilities of the several competing models. Simulation studies with Poisson GLMMs with spatial random effects and overdispersion random effects show that our approach performs favorably when compared to widely used competing Bayesian methods including deviance information criterion and Watanabe–Akaike information criterion. We illustrate the usefulness and flexibility of our approach with three case studies including a Poisson longitudinal model, a Poisson spatial model, and a logistic mixed model. Our proposed approach is implemented in the R package GLMMselect that is available on CRAN.
more »
« less
Multivariate nearest‐neighbors Gaussian processes with random covariance matrices
Abstract We propose a non‐stationary spatial model based on a normal‐inverse‐Wishart framework, conditioning on a set of nearest‐neighbors. The model, called nearest‐neighbor Gaussian process with random covariance matrices is developed for both univariate and multivariate spatial settings and allows for fully flexible covariance structures that impose no stationarity or isotropic restrictions. In addition, the model can handle duplicate observations and missing data. We consider an approach based on integrating out the spatial random effects that allows fast inference for the model parameters. We also consider a full hierarchical approach that leverages the sparse structures induced by the model to perform fast Monte Carlo computations. Strong computational efficiency is achieved by leveraging the adaptive localized structure of the model that allows for a high level of parallelization. We illustrate the performance of the model with univariate and bivariate simulations, as well as with observations from two stationary satellites consisting of albedo measurements.
more »
« less
- Award ID(s):
- 1953168
- PAR ID:
- 10523586
- Publisher / Repository:
- Environmetrics - Wiley
- Date Published:
- Journal Name:
- Environmetrics
- Volume:
- 35
- Issue:
- 3
- ISSN:
- 1180-4009
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional, stationary matrix-variate Gaussian time series. The correlation function of the matrix series is Kronecker-decomposable. Unlike most past work on matrix graphical models, where independent and identically distributed (i.i.d.) observations of matrix-variate are assumed to be available, we allow time-dependent observations. We follow a time-delay embedding approach where with each matrix node, we associate a random vector consisting of a scalar series component and its time-delayed copies. A group-lasso penalized negative pseudo log-likelihood (NPLL) objective function is formulated to estimate a Kronecker-decomposable covariance matrix which allows for inference of the underlying CIG. The NPLL function is bi-convex and the Kronecker-decomposable covariance matrix is estimated via flip-flop optimization of the NPLL function. Each iteration of flip-flop optimization is solved via an alternating direction method of multipliers (ADMM) approach. Numerical results illustrate the proposed approach which outperforms an existing i.i.d. modeling based approach as well as an existing frequency-domain approach for dependent data, in correctly detecting the graph edges.more » « less
-
Abstract Gaussian process (GP) is a staple in the toolkit of a spatial statistician. Well‐documented computing roadblocks in the analysis of large geospatial datasets using GPs have now largely been mitigated via several recent statistical innovations. Nearest neighbor Gaussian process (NNGP) has emerged as one of the leading candidates for such massive‐scale geospatial analysis owing to their empirical success. This article reviews the connection of NNGP to sparse Cholesky factors of the spatial precision (inverse‐covariance) matrix. Focus of the review is on these sparse Cholesky matrices which are versatile and have recently found many diverse applications beyond the primary usage of NNGP for fast parameter estimation and prediction in the spatial (generalized) linear models. In particular, we discuss applications of sparse NNGP Cholesky matrices to address multifaceted computational issues in spatial bootstrapping, simulation of large‐scale realizations of Gaussian random fields, and extensions to nonparametric mean function estimation of a GP using random forests. We also review a sparse‐Cholesky‐based model for areal (geographically aggregated) data that addresses long‐established interpretability issues of existing areal models. Finally, we highlight some yet‐to‐be‐addressed issues of such sparse Cholesky approximations that warrant further research. This article is categorized under:Algorithms and Computational Methods > AlgorithmsAlgorithms and Computational Methods > Numerical Methodsmore » « less
-
Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.more » « less
-
A new type of ensemble Kalman filter is developed, which is based on replacing the sample covariance in the analysis step by its diagonal in a spectral basis. It is proved that this technique improves the approximation of the covariance when the covariance itself is diagonal in the spectral basis, as is the case, e.g., for a second-order stationary random field and the Fourier basis. The method is extended by wavelets to the case when the state variables are random fields which are not spatially homogeneous. Efficient implementations by the fast Fourier transform (FFT) and discrete wavelet transform (DWT) are presented for several types of observations, including high-dimensional data given on a part of the domain, such as radar and satellite images. Computational experiments confirm that the method performs well on the Lorenz 96 problem and the shallow water equations with very small ensembles and over multiple analysis cycles.more » « less
An official website of the United States government

