skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 11, 2026

Title: Longitudinal Omics Data Analysis: A Review on Models, Algorithms, and Tools
Longitudinal omics data (LOD) analysis is essential for understanding the dynamics of biological processes and disease progression over time. This review explores various statistical and computational approaches for analyzing such data, emphasizing their applications and limitations. The main characteristics of longitudinal data, such as imbalancedness, high-dimensionality, and non-Gaussianity are discussed for modeling and hypothesis testing. We discuss the properties of linear mixed models (LMM) and generalized linear mixed models (GLMM) as foundation stones in LOD analyses and highlight their extensions to handle the obstacles in the frequentist and Bayesian frameworks. We differentiate in dynamic data analysis between time-course and longitudinal analyses, covering functional data analysis (FDA) and replication constraints. We explore classification techniques, single-cell as exemplary omics longitudinal studies, survival modeling, and multivariate methods for clinical/biomarker-based applications. Emerging topics, including data integration, clustering, and network-based modeling, are also discussed. We categorized the state-of-the-art approaches applicable to omics data, highlighting how they address the data features. This review serves as a guideline for researchers seeking robust strategies to analyze longitudinal omics data effectively, which is usually complex.  more » « less
Award ID(s):
2109688
PAR ID:
10627427
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
arxiv
Date Published:
Format(s):
Medium: X
Institution:
The George Washington University
Sponsoring Org:
National Science Foundation
More Like this
  1. Coelho, Luis Pedro (Ed.)
    It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles. 
    more » « less
  2. Martelli, Pier Luigi (Ed.)
    Abstract MotivationThere is a growing interest in longitudinal omics data paired with some longitudinal clinical outcome. Given a large set of continuous omics variables and some continuous clinical outcome, each measured for a few subjects at only a few time points, we seek to identify those variables that co-vary over time with the outcome. To motivate this problem we study a dataset with hundreds of urinary metabolites along with Tuberculosis mycobacterial load as our clinical outcome, with the objective of identifying potential biomarkers for disease progression. For such data clinicians usually apply simple linear mixed effects models which often lack power given the low number of replicates and time points. We propose a penalized regression approach on the first differences of the data that extends the lasso + Laplacian method [Li and Li (Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 2008;24:1175–82.)] to a longitudinal group lasso + Laplacian approach. Our method, PROLONG, leverages the first differences of the data to increase power by pairing the consecutive time points. The Laplacian penalty incorporates the dependence structure of the variables, and the group lasso penalty induces sparsity while grouping together all contemporaneous and lag terms for each omic variable in the model. ResultsWith an automated selection of model hyper-parameters, PROLONG correctly selects target metabolites with high specificity and sensitivity across a wide range of scenarios. PROLONG selects a set of metabolites from the real data that includes interesting targets identified during EDA. Availability and implementationAn R package implementing described methods called “prolong” is available at https://github.com/stevebroll/prolong. Code snapshot available at 10.5281/zenodo.14804245. 
    more » « less
  3. While generalized linear mixed models are useful, optimal design questions for such models are challenging due to complexity of the information matrices. For longitudinal data, after comparing three approximations for the information matrices, we propose an approximation based on the penalized quasi-likelihood method.We evaluate this approximation for logistic mixed models with time as the single predictor variable. Assuming that the experimenter controls at which time observations are to be made, the approximation is used to identify locally optimal designs based on the commonly used A- and D-optimality criteria. The method can also be used for models with random block effects. Locally optimal designs found by a Particle Swarm Optimization algorithm are presented and discussed. As an illustration, optimal designs are derived for a study on self-reported disability in olderwomen. Finally,we also study the robustness of the locally optimal designs to mis-specification of the covariance matrix for the random effects. 
    more » « less
  4. In medical studies, time-to-event outcomes such as time to death or relapse of a disease are routinely recorded along with longitudinal data that are observed intermittently during the follow-up period. For various reasons, marginal approaches to model the event time, corresponding to separate approaches for survival data/longitudinal data, tend to induce bias and lose efficiency. Instead, a joint modeling approach that brings the two types of data together can reduce or eliminate the bias and yield a more efficient estimation procedure. A well-established avenue for joint modeling is the joint likelihood approach that often produces semiparametric efficient estimators for the finite-dimensional parameter vectors in both models. Through a transformation survival model with an unspecified baseline hazard function, this review introduces joint modeling that accommodates both baseline covariates and time-varying covariates. The focus is on the major challenges faced by joint modeling and how they can be overcome. A review of available software implementations and a brief discussion of future directions of the field are also included. 
    more » « less
  5. Linear mixed models (LMMs) can be applied in the meta-analyses of responses from individuals across multiple contexts, increasing power to detect associations while accounting for confounding effects arising from within-individual variation. However, traditional approaches to fitting these models can be computationally intractable. Here, we describe an efficient and exact method for fitting a multiple-context linear mixed model. Whereas existing exact methods may be cubic in their time complexity with respect to the number of individuals, our approach for multiple-context LMMs (mcLMM) is linear. These improvements allow for large-scale analyses requiring computing time and memory magnitudes of order less than existing methods. As examples, we apply our approach to identify expression quantitative trait loci from large-scale gene expression data measured across multiple tissues as well as joint analyses of multiple phenotypes in genomewide association studies at biobank scale. 
    more » « less