skip to main content


Title: Robust Bayesian growth curve modelling using conditional medians

Growth curve models have been widely used to analyse longitudinal data in social and behavioural sciences. Although growth curve models with normality assumptions are relatively easy to estimate, practical data are rarely normal. Failing to account for non‐normal data may lead to unreliable model estimation and misleading statistical inference. In this work, we propose a robust approach for growth curve modelling using conditional medians that are less sensitive to outlying observations. Bayesian methods are applied for model estimation and inference. Based on the existing work on Bayesian quantile regression using asymmetric Laplace distributions, we use asymmetric Laplace distributions to convert the problem of estimating a median growth curve model into a problem of obtaining the maximum likelihood estimator for a transformed model. Monte Carlo simulation studies have been conducted to evaluate the numerical performance of the proposed approach with data containing outliers or leverage observations. The results show that the proposed approach yields more accurate and efficient parameter estimates than traditional growth curve modelling. We illustrate the application of our robust approach using conditional medians based on a real data set from the Virginia Cognitive Aging Project.

 
more » « less
Award ID(s):
1951038
NSF-PAR ID:
10452749
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
British Journal of Mathematical and Statistical Psychology
Volume:
74
Issue:
2
ISSN:
0007-1102
Page Range / eLocation ID:
p. 286-312
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability. 
    more » « less
  2. null (Ed.)
    Growth mixture modeling is a popular analytic tool for longitudinal data analysis. It detects latent groups based on the shapes of growth trajectories. Traditional growth mixture modeling assumes that outcome variables are normally distributed within each class. When data violate this normality assumption, however, it is well documented that the traditional growth mixture modeling mislead researchers in determining the number of latent classes as well as in estimating parameters. To address nonnormal data in growth mixture modeling, robust methods based on various nonnormal distributions have been developed. As a new robust approach, growth mixture modeling based on conditional medians has been proposed. In this article, we present the results of two simulation studies that evaluate the performance of the median-based growth mixture modeling in identifying the correct number of latent classes when data follow the normality assumption or have outliers. We also compared the performance of the median-based growth mixture modeling to the performance of traditional growth mixture modeling as well as robust growth mixture modeling based on t distributions. For identifying the number of latent classes in growth mixture modeling, the following three Bayesian model comparison criteria were considered: deviance information criterion, Watanabe-Akaike information criterion, and leave-one-out cross validation. For the median-based growth mixture modeling and t -based growth mixture modeling, our results showed that they maintained quite high model selection accuracy across all conditions in this study (ranged from 87 to 100%). In the traditional growth mixture modeling, however, the model selection accuracy was greatly influenced by the proportion of outliers. When sample size was 500 and the proportion of outliers was 0.05, the correct model was preferred in about 90% of the replications, but the percentage dropped to about 40% as the proportion of outliers increased to 0.15. 
    more » « less
  3. Abstract

    Structured population models are among the most widely used tools in ecology and evolution. Integral projection models (IPMs) use continuous representations of how survival, reproduction and growth change as functions of state variables such as size, requiring fewer parameters to be estimated than projection matrix models (PPMs). Yet, almost all published IPMs make an important assumption that size‐dependent growth transitions are or can be transformed to be normally distributed. In fact, many organisms exhibit highly skewed size transitions. Small individuals can grow more than they can shrink, and large individuals may often shrink more dramatically than they can grow. Yet, the implications of such skew for inference from IPMs has not been explored, nor have general methods been developed to incorporate skewed size transitions into IPMs, or deal with other aspects of real growth rates, including bounds on possible growth or shrinkage.

    Here, we develop a flexible approach to modelling skewed growth data using a modified beta regression model. We propose that sizes first be converted to a (0,1) interval by estimating size‐dependent minimum and maximum sizes through quantile regression. Transformed data can then be modelled using beta regression with widely available statistical tools. We demonstrate the utility of this approach using demographic data for a long‐lived plant, gorgonians and an epiphytic lichen. Specifically, we compare inferences of population parameters from discrete PPMs to those from IPMs that either assume normality or incorporate skew using beta regression or, alternatively, a skewed normal model.

    The beta and skewed normal distributions accurately capture the mean, variance and skew of real growth distributions. Incorporating skewed growth into IPMs decreases population growth and estimated life span relative to IPMs that assume normally distributed growth, and more closely approximate the parameters of PPMs that do not assume a particular growth distribution. A bounded distribution, such as the beta, also avoids the eviction problem caused by predicting some growth outside the modelled size range.

    Incorporating biologically relevant skew in growth data has important consequences for inference from IPMs. The approaches we outline here are flexible and easy to implement with existing statistical tools.

     
    more » « less
  4. Abstract

    Integrated population models (IPMs) have become increasingly popular for the modelling of populations, as investigators seek to combine survey and demographic data to understand processes governing population dynamics. These models are particularly useful for identifying and exploring knowledge gaps within life histories, because they allow investigators to estimate biologically meaningful parameters, such as immigration or reproduction, that were previously unidentifiable without additional data. AsIPMs have been developed relatively recently, there is much to learn about model behaviour. Behaviour of parameters, such as estimates near boundaries, and the consequences of varying degrees of dependency among datasets, has been explored. However, the reliability of parameter estimates remains underexamined, particularly when models include parameters that are not identifiable from one data source, but are indirectly identifiable from multiple datasets and a presumed model structure, such as the estimation of immigration using capture‐recapture, fecundity and count data, combined with a life‐history model.

    To examine the behaviour of model parameter estimates, we simulated stable populations closed to immigration and emigration. We simulated two scenarios that might induce error into survival estimates: marker induced bias in the capture–mark–recapture data and heterogeneity in the mortality process. We subsequently fit capture–mark–recapture, state‐space and fecundity models, as well asIPMs that estimated additional parameters.

    Simulation results suggested that when model assumptions are violated, estimation of additional, previously unidentifiable, parameters usingIPMs may be extremely sensitive to these violations of model assumption. For example, when annual marker loss was simulated, estimates of survival rates were low and estimates of immigration rate from anIPMwere high. When heterogeneity in the mortality process was induced, there were substantial relative differences between the medians of posterior distributions and truth for juvenile survival and fecundity.

    Our results have important implications for biological inference when usingIPMs, as well as future model development and implementation. Specifically, using multiple datasets to identify additional parameters resulted in the posterior distributions of additional parameters directly reflecting the effects of the violations of model assumptions in integrated modelling frameworks. We suggest that investigators interpret posterior distributions of these parameters as a combination of biological process and systematic error.

     
    more » « less
  5. Abstract Aim

    Ecological niche modelling requires robust estimation of model performance and significance, but common evaluation approaches often yield biased estimates. Null models provide a solution but are rarely used in this field. We implemented an important modification to existing null model tests, evaluating null models with the same withheld records that were used to evaluate the real model. We built and evaluated models across a range of modelling scenarios and for various performance measures using the algorithmMaxentand the monk parakeet (Myiopsitta monachus).

    Location

    Native range in Southern America and global invasions predominantly in North/Central America and Europe.

    Methods

    We tested the ability of models built under 15 scenarios (five sets of calibration records and three settings that varied the level of model complexity) to predict spatially independent evaluation data in the invaded range (in effect, testing the models under spatial transfer). We quantified performance with measures of discriminatory ability and overfitting based on area under the receiver operating characteristic curve (AUC) and the omission error rate. We estimated null distributions of these measures and calculated effect size and significance. We determined how these estimates varied across modelling scenarios, comparing with two tests existing in the literature.

    Results

    Performance varied starkly across modelling scenarios. As expected, the measures of overfitting agreed with each other and provided different information than that of discriminatory ability. However, high performance per se did not show strong association with high effect size and significance.

    Main Conclusions

    Ecological niche models should be assessed with measures of effect size and significance based on appropriate null distributions, in contrast to several approaches existing in the literature. The proposed approach using independent evaluation data, implemented with our accompanying code and R package, allows such estimates for either the same or a different region/time period, and it merits use and continued development.

     
    more » « less