skip to main content


Title: WiSER: Robust and scalable estimation and inference of within‐subject variances from intensive longitudinal data
Abstract

The availability of vast amounts of longitudinal data from electronic health records (EHRs) and personal wearable devices opens the door to numerous new research questions. In many studies, individual variability of a longitudinal outcome is as important as the mean. Blood pressure fluctuations, glycemic variations, and mood swings are prime examples where it is critical to identify factors that affect the within‐individual variability. We propose a scalable method, within‐subject variance estimator by robust regression (WiSER), for the estimation and inference of the effects of both time‐varying and time‐invariant predictors on within‐subject variance. It is robust against the misspecification of the conditional distribution of responses or the distribution of random effects. It shows similar performance as the correctly specified likelihood methods but is 103∼ 105times faster. The estimation algorithm scales linearly in the total number of observations, making it applicable to massive longitudinal data sets. The effectiveness of WiSER is evaluated in extensive simulation studies. Its broad applicability is illustrated using the accelerometry data from the Women's Health Study and a clinical trial for longitudinal diabetes care.

 
more » « less
Award ID(s):
2054253
NSF-PAR ID:
10364313
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
78
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1313-1327
Size(s):
["p. 1313-1327"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    In many observational longitudinal studies, the outcome of interest presents a skewed distribution, is subject to censoring due to detection limit or other reasons, and is observed at irregular times that may follow a outcome-dependent pattern. In this work, we consider quantile regression modeling of such longitudinal data, because quantile regression is generally robust in handling skewed and censored outcomes and is flexible to accommodate dynamic covariate-outcome relationships. Specifically, we study a longitudinal quantile regression model that specifies covariate effects on the marginal quantiles of the longitudinal outcome. Such a model is easy to interpret and can accommodate dynamic outcome profile changes over time. We propose estimation and inference procedures that can appropriately account for censoring and irregular outcome-dependent follow-up. Our proposals can be readily implemented based on existing software for quantile regression. We establish the asymptotic properties of the proposed estimator, including uniform consistency and weak convergence. Extensive simulations suggest good finite-sample performance of the new method. We also present an analysis of data from a long-term study of a population exposed to polybrominated biphenyls (PBB), which uncovers an inhomogeneous PBB elimination pattern that would not be detected by traditional longitudinal data analysis.

     
    more » « less
  2. Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability. 
    more » « less
  3. Abstract

    Various aspects of sociality in mammals (e.g., dyadic connectedness) are linked with measures of biological fitness (e.g., longevity). How within- and between-individual variation in relevant social traits arises in uncontrolled wild populations is challenging to determine but is crucial for understanding constraints on the evolution of sociality. We use an advanced statistical method, known as the ‘animal model’, which incorporates pedigree information, to look at social, genetic, and environmental influences on sociality in a long-lived wild primate. We leverage a longitudinal database spanning 20 years of observation on individually recognized white-faced capuchin monkeys (Cebus capucinus imitator), with a multi-generational pedigree. We analyze two measures of spatial association, using repeat sampling of 376 individuals (mean: 53.5 months per subject, range: 6–185 months per subject). Conditioned on the effects of age, sex, group size, seasonality, and El Niño–Southern Oscillation phases, we show low to moderate long-term repeatability (across years) of the proportion of time spent social (posterior mode [95% Highest Posterior Density interval]: 0.207 [0.169, 0.265]) and of average number of partners (0.144 [0.113, 0.181]) (latent scale). Most of this long-term repeatability could be explained by modest heritability (h2social: 0.152 [0.094, 0.207];h2partners: 0.113 [0.076, 0.149]) with small long-term maternal effects (m2social: 0.000 [0.000, 0.045];m2partners: 0.000 [0.000, 0.041]). Our models capture the majority of variance in our behavioral traits, with much of the variance explained by temporally changing factors, such as group of residence, highlighting potential limits to the evolvability of our trait due to social and environmental constraints.

     
    more » « less
  4. Abstract

    Statistical analysis of longitudinal data often involves modeling treatment effects on clinically relevant longitudinal biomarkers since an initial event (the time origin). In some studies including preventive HIV vaccine efficacy trials, some participants have biomarkers measured starting at the time origin, whereas others have biomarkers measured starting later with the time origin unknown. The semiparametric additive time-varying coefficient model is investigated where the effects of some covariates vary nonparametrically with time while the effects of others remain constant. Weighted profile least squares estimators coupled with kernel smoothing are developed. The method uses the expectation maximization approach to deal with the censored time origin. The Kaplan–Meier estimator and other failure time regression models such as the Cox model can be utilized to estimate the distribution and the conditional distribution of left censored event time related to the censored time origin. Asymptotic properties of the parametric and nonparametric estimators and consistent asymptotic variance estimators are derived. A two-stage estimation procedure for choosing weight is proposed to improve estimation efficiency. Numerical simulations are conducted to examine finite sample properties of the proposed estimators. The simulation results show that the theory and methods work well. The efficiency gain of the two-stage estimation procedure depends on the distribution of the longitudinal error processes. The method is applied to analyze data from the Merck 023/HVTN 502 Step HIV vaccine study.

     
    more » « less
  5. Abstract

    In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post‐baseline time. A simple solution is the last‐value‐carry‐forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high‐dimensional integrals without a closed‐form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time‐to‐event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real‐time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.

     
    more » « less