skip to main content


Title: Cross‐component registration for multivariate functional data, with application to growth curves
Abstract

Multivariate functional data are becoming ubiquitous with advances in modern technology and are substantially more complex than univariate functional data. We propose and study a novel model for multivariate functional data where the component processes are subject to mutual time warping. That is, the component processes exhibit a similar shape but are subject to systematic phase variation across their time domains. To address this previously unconsidered mode of warping, we propose new registration methodology that is based on a shift‐warping model. Our method differs from all existing registration methods for functional data in a fundamental way. Namely, instead of focusing on the traditional approach to warping, where one aims to recover individual‐specific registration, we focus on shift registration across the components of a multivariate functional data vector on a population‐wide level. Our proposed estimates for these shifts are identifiable, enjoy parametric rates of convergence, and often have intuitive physical interpretations, all in contrast to traditional curve‐specific registration approaches. We demonstrate the implementation and interpretation of the proposed method by applying our methodology to the Zürich Longitudinal Growth data and study its finite sample properties in simulations.

 
more » « less
Award ID(s):
2014626
NSF-PAR ID:
10449823
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
77
Issue:
3
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 839-851
Size(s):
["p. 839-851"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Multivariate functional data present theoretical and practical complications that are not found in univariate functional data. One of these is a situation where the component functions of multivariate functional data are positive and are subject to mutual time warping. That is, the component processes exhibit a common shape but are subject to systematic phase variation across their domains in addition to subject‐specific time warping, where each subject has its own internal clock. This motivates a novel model for multivariate functional data that connect such mutual time warping to a latent‐deformation‐based framework by exploiting a novel time‐warping separability assumption. This separability assumption allows for meaningful interpretation and dimension reduction. The resulting latent deformation model is shown to be well suited to represent commonly encountered functional vector data. The proposed approach combines a random amplitude factor for each component with population‐based registration across the components of a multivariate functional data vector and includes a latent population function, which corresponds to a common underlying trajectory. We propose estimators for all components of the model, enabling implementation of the proposed data‐based representation for multivariate functional data and downstream analyses such as Fréchet regression. Rates of convergence are established when curves are fully observed or observed with measurement error. The usefulness of the model, interpretations, and practical aspects are illustrated in simulations and with application to multivariate human growth curves and multivariate environmental pollution data.

     
    more » « less
  2. Abstract

    In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post‐baseline time. A simple solution is the last‐value‐carry‐forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high‐dimensional integrals without a closed‐form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time‐to‐event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real‐time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.

     
    more » « less
  3. Summary

    Multivariate functional data are increasingly encountered in data analysis, whereas statistical models for such data are not well developed yet. Motivated by a case-study where one aims to quantify the relationship between various longitudinally recorded behaviour intensities for Drosophila flies, we propose a functional linear manifold model. This model reflects the functional dependence between the components of multivariate random processes and is defined through data-determined linear combinations of the multivariate component trajectories, which are characterized by a set of varying-coefficient functions. The time varying linear relationships that govern the components of multivariate random functions yield insights about the underlying processes and also lead to noise-reduced representations of the multivariate component trajectories. The functional linear manifold model proposed is put to the task for an analysis of longitudinally observed behavioural patterns of flying, feeding, walking and resting over the lifespan of Drosophila flies and is also investigated in simulations.

     
    more » « less
  4. Abstract

    Resting‐state functional network connectivity (rsFNC) has shown utility for identifying characteristic functional brain patterns in individuals with psychiatric and mood disorders, providing a promising avenue for biomarker development. However, several factors have precluded widespread clinical adoption of rsFNC diagnostics, namely a lack of standardized approaches for capturing comparable and reproducible imaging markers across individuals, as well as the disagreement on the amount of data required to robustly detect intrinsic connectivity networks (ICNs) and diagnostically relevant patterns of rsFNC at the individual subject level. Recently, spatially constrained independent component analysis (scICA) has been proposed as an automated method for extracting ICNs standardized to a chosen network template while still preserving individual variation. Leveraging the scICA methodology, which solves the former challenge of standardized neuroimaging markers, we investigate the latter challenge of identifying a minimally sufficient data length for clinical applications of resting‐state fMRI (rsfMRI). Using a dataset containing rsfMRI scans of individuals with schizophrenia and controls (M = 310) as well as simulated rsfMRI, we evaluated the robustness of ICN and rsFNC estimates at both the subject‐ and group‐level, as well as the performance of diagnostic classification, with respect to the length of the rsfMRI time course. We found individual estimates of ICNs and rsFNC from the full‐length (5 min) reference time course were sufficiently approximated with just 3–3.5 min of data (r = 0.85, 0.88, respectively), and significant differences in group‐average rsFNC could be sufficiently approximated with even less data, just 2 min (r = 0.86). These results from the shorter clinical data were largely consistent with the results from validation experiments using longer time series from both simulated (30 min) and real‐world (14 min) datasets, in which estimates of subject‐level FNC were reliably estimated with 3–5 min of data. Moreover, in the real‐world data we found rsFNC and ICN estimates generated across the full range of data lengths (0.5–14 min) more reliably matched those generated from the first 5 min of scan time than those generated from the last 5 min, suggesting increased influence of “late scan” noise factors such as fatigue or drowsiness may limit the reliability of FNC from data collected after 10+ min of scan time, further supporting the notion of shorter scans. Lastly, a diagnostic classification model trained on just 2 min of data retained 97%–98% classification accuracy relative to that of the full‐length reference model. Our results suggest that, when decomposed with scICA, rsfMRI scans of just 2–5 min show good clinical utility without significant loss of individual FNC information of longer scan lengths.

     
    more » « less
  5. Abstract

    Autism Spectrum Disorder (ASD) is characterized as a neurodevelopmental disorder with a heterogeneous nature, influenced by genetics and exhibiting diverse clinical presentations. In this study, we dissect Autism Spectrum Disorder (ASD) into its behavioral components, mirroring the diagnostic process used in clinical settings. Morphological features are extracted from magnetic resonance imaging (MRI) scans, found in the publicly available dataset ABIDE II, identifying the most discriminative features that differentiate ASD within various behavioral domains. Then, each subject is categorized as having severe, moderate, or mild ASD, or typical neurodevelopment (TD), based on the behavioral domains of the Social Responsiveness Scale (SRS). Through this study, multiple artificial intelligence (AI) models are utilized for feature selection and classifying each ASD severity and behavioural group. A multivariate feature selection algorithm, investigating four different classifiers with linear and non-linear hypotheses, is applied iteratively while shuffling the training-validation subjects to find the set of cortical regions with statistically significant association with ASD. A set of six classifiers are optimized and trained on the selected set of features using 5-fold cross-validation for the purpose of severity classification for each behavioural group. Our AI-based model achieved an average accuracy of 96%, computed as the mean accuracy across the top-performing AI models for feature selection and severity classification across the different behavioral groups. The proposed AI model has the ability to accurately differentiate between the functionalities of specific brain regions, such as the left and right caudal middle frontal regions. We propose an AI-based model that dissects ASD into behavioral components. For each behavioral component, the AI-based model is capable of identifying the brain regions which are associated with ASD as well as utilizing those regions for diagnosis. The proposed system can increase the speed and accuracy of the diagnostic process and result in improved outcomes for individuals with ASD, highlighting the potential of AI in this area.

     
    more » « less