skip to main content

Title: A class of generalized linear mixed models adjusted for marginal interpretability

Two popular approaches for relating correlated measurements of a non‐Gaussian response variable to a set of predictors are to fit amarginal modelusing generalized estimating equations and to fit ageneralized linear mixed model(GLMM) by introducing latent random variables. The first approach is effective for parameter estimation, but leaves one without a formal model for the data with which to assess quality of fit or make individual‐level predictions for future observations. The second approach overcomes these deficiencies, but leads to parameter estimates that must be interpreted conditional on the latent variables. To obtain marginal summaries, one needs to evaluate an analytically intractable integral or use attenuation factors as an approximation. Further, we note an unpalatable implication of the standard GLMM. To resolve these issues, we turn to a class of marginally interpretable GLMMs that lead to parameter estimates with a marginal interpretation while maintaining the desirable statistical properties of a conditionally specified model and avoiding problematic implications. We establish the form of these models under the most commonly used link functions and address computational issues. For logistic mixed effects models, we introduce an accurate and efficient method for evaluating the logistic‐normal integral.

more » « less
Award ID(s):
1613110 2015552
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Page Range / eLocation ID:
p. 427-440
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Ecological analyses typically involve many interacting variables. Ecologists often specify lagged interactions in community dynamics (i.e. vector‐autoregressive models) or simultaneous interactions (e.g. structural equation models), but there is less familiarity with dynamic structural equation models (DSEM) that can include any simultaneous or lagged effect in multivariate time‐series analysis.

    We propose a novel approach to parameter estimation for DSEM, which involves constructing a Gaussian Markov random field (GMRF) representing simultaneous and lagged path coefficients, and then fitting this as a generalized linear mixed model to missing and/or non‐normal data. We provide a new R‐packagedsem, which extends the ‘arrow interface’ from path analysis to represent user‐specified lags when constructing the GMRF. We also outline how the resulting nonseparable precision matrix can generalize existing separable models, for example, for time‐series and species interactions in a vector‐autoregressive model.

    We first demonstratedsemby simulating a two‐species vector‐autoregressive model based on wolf–moose interactions on Isle Royale. We show that DSEM has improved precision when data are missing relative to a conventional dynamic linear model. We then demonstrate DSEM via two contrasting case studies. The first identifies a trophic cascade where decreased sunflower starfish has increased urchin and decreased kelp densities, while sea otters have a simultaneous positive effect on kelp in the California Current from 1999 to 2018. The second estimates how declining sea ice has decreased cold‐water habitats, driving a decreased density for fall copepod predation and inhibiting early‐life survival for Alaska pollock from 1963 to 2023.

    We conclude that DSEM can be fitted efficiently as a GLMM involving missing data, while allowing users to specify both simultaneous and lagged effects in a time‐series structural model. DSEM then allows conceptual models (developed with stakeholder input or from ecological expertise) to be fitted to incomplete time series and provides a simple interface for granular control over the number of estimated time‐series parameters. Finally, computational methods are sufficiently simple that DSEM can be embedded as component within larger (e.g. integrated population) models. We therefore recommend greater exploration and performance testing for DSEM relative to familiar time‐series forecasting methods.

    more » « less
  2. The time-to-event response is commonly thought of as survival analysis, and typically concerns statistical modeling of expected life span. In the example presented here, alfalfa leafcutting bees, Megachile rotundata, were randomly exposed to one of eight experimental thermoprofiles or two control thermoprofiles, for one to eight weeks. The incorporation of these fluctuating thermoprofiles in the management of the bees increases survival and blocks the development of sub-lethal effects, such as delayed emergence. The data collected here investigates the question of whether any experimental thermoprofile provides better overall survival, with a reduction and delay of sub-lethal effects. The study design incorporates typical aspects of agricultural research; random blocking effects. All M. rotundata prepupae brood cells were randomly placed in individual wells of 24-well culture plates. Plates were randomly assigned to thermoprofile and exposure duration, with three plate replicates per thermoprofile x exposure time. Bees were observed for emergence for 40 days. All bees that were not yet emerged prior to fixed end of study were considered to be censored observations. We fit a generalized linear mixed model (GLMM), using the SAS® GLIMMIX Procedure to the censored data and obtained time-to-emergence function estimates. As opposed to a typical survival analysis approach, such as Kaplan-Meier curve, in the GLMM we were able to include the random model effects from the study design. This is an important inclusion in the model, such that correct standard error and test statistics are generated for mixed models with non-Gaussian data. 
    more » « less
  3. Abstract

    Traits differentially adapt plant species to particular conditions generating compositional shifts along environmental gradients. As a result, community‐scale trait values show concomitant shifts, termed trait‒environment relationships. Trait‒environment relationships are often assessed by evaluating community‐weighted mean (CWM) traits observed along environmental gradients. Regression‐based approaches (CWMr) assume that local communities exhibit traits centred at a single optimum value and that traits do not covary meaningfully. Evidence suggests that the shape of trait‒abundance relationships can vary widely along environmental gradients—reflecting complex interactions—and traits are usually interrelated. We used a model that accounts for these factors to explore trait‒environment relationships in herbaceous forest plant communities in Wisconsin (USA).

    We built a generalized linear mixed model (GLMM) to analyse how abundances of 185 species distributed among 189 forested sites vary in response to four functional traits (vegetative height—VH, leaf size—LS, leaf mass per area—LMA and leaf carbon content), six environmental variables describing overstorey, soil and climate conditions, and their interactions. The GLMM allowed us to assess the nature and relative strength of the resulting 24 trait‒environment relationships. We also compared results between GLMM and CWMr to explore how conclusions differ between approaches.

    The GLMM identified five significant trait‒environment relationships that together explain ~40% of variation in species abundances across sites. Temperature appeared as a key environmental driver, with warmer and more seasonal sites favouring taller plants. Soil texture and temperature seasonality affected LS and LMA; seasonality effects on LS and LMA were nonlinear, declining at more seasonal sites. Although often assumed for CWMr, only some traits under certain conditions had centred optimum trait‒abundance relationships. CWMr more liberally identified (13) trait‒environment relationships as significant but failed to detect the temperature seasonality‒LMA relationship identified by the GLMM.

    Synthesis. Although GLMM represents a more methodologically complex approach than CWMr, it identified a reduced set of trait‒environment relationships still capable of accounting for the responses of forest understorey herbs to environmental gradients. It also identified separate effects of mean and seasonal temperature on LMA that appear important in these forests, generating useful insights and supporting broader application of GLMM approach to understand trait‒environment relationships.

    more » « less
  4. Abstract

    We consider user retention analytics for online freemium role-playing games (RPGs). RPGs constitute a very popular genre of computer-based games that, along with a player’s gaming actions, focus on the development of the player’s in-game virtual character through a persistent exploration of the gaming environment. Most RPGs follow the freemium business model in which the gamers can play for free but they are charged for premium add-on amenities. As with other freemium products, RPGs suffer from the curse of high dropout rates. This makes retention analysis extremely important for successful operation and survival of their gaming portals. Here, we develop a disciplined statistical framework for retention analysis by modelling multiple in-game player characteristics along with the dropout probabilities. We capture players’ motivations through engagement times, collaboration and achievement score at each level of the game, and jointly model them using a generalized linear mixed model (glmm) framework that further includes a time-to-event variable corresponding to churn. We capture the interdependencies in a player’s level-wise engagement, collaboration, achievement with dropout through a shared parameter model. We illustrate interesting changes in player behaviours as the gaming level progresses. The parameters in our joint model were estimated by a Hamiltonian Monte Carlo algorithm which incorporated a divide-and-recombine approach for increased scalability in glmm estimation that was needed to accommodate our large longitudinal gaming data-set. By incorporating the level-wise changes in a player’s motivations and using them for dropout rate prediction, our method greatly improves on state-of-the-art retention models. Based on data from a popular action based RPG, we demonstrate the competitive optimality of our proposed joint modelling approach by exhibiting its improved predictive performance over competitors. In particular, we outperform aggregate statistics based methods that ignore level-wise progressions as well as progression tracking non-joint model such as the Cox proportional hazards model. We also display improved predictions of popular marketing retention statistics and discuss how they can be used in managerial decision making.

    more » « less
  5. Recent technological advances in systems neuroscience have led to a shift away from using simple tasks, with low-dimensional, well-controlled stimuli, towards trying to understand neural activity during naturalistic behavior. However, with the increase in number and complexity of task-relevant features, standard analyses such as estimating tuning functions become challenging. Here, we use a Poisson generalized additive model (P-GAM) with spline nonlinearities and an exponential link function to map a large number of task variables (input stimuli, behavioral outputs, or activity of other neurons, modeled as discrete events or continuous variables) into spike counts. We develop efficient procedures for parameter learning by optimizing a generalized cross-validation score and infer marginal confidence bounds for the contribution of each feature to neural responses. This allows us to robustly identify a minimal set of task features that each neuron is responsive to, circumventing computationally demanding model comparison. We show that our estimation procedure outperforms traditional regularized GLMs in terms of both fit quality and computing time. When applied to neural recordings from monkeys performing a virtual reality spatial navigation task, P-GAM reveals mixed selectivity and preferential coupling between neurons with similar tuning. 
    more » « less