Predicting patient life expectancy is of great importance for clinicians in making treatment decisions. This prediction needs to be conducted in a dynamic manner, based on longitudinal biomarkers repeatedly measured during the patient's post-treatment follow-up period. The prediction is updated any time a new biomarker measurement is obtained. The heterogeneity across patients of biomarker trajectories over time requires flexible and powerful approaches to model noisy and irregularly measured longitudinal data. In this article, we use functional principal component analysis (FPCA) to extract the dominant features of the biomarker trajectory of each individual, and use these features as time-dependent predictors (covariates) in a transformed mean residual life (MRL) regression model to conduct dynamic prediction. Simulation studies demonstrate the improved performance of the transformed MRL model that includes longitudinal biomarker information in the prediction. We apply the proposed method to predict the remaining time expectancy until disease progression for patients with chronic myeloid leukemia, using the transcript levels of an oncogene, BCR-ABL.
In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post‐baseline time. A simple solution is the last‐value‐carry‐forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high‐dimensional integrals without a closed‐form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time‐to‐event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real‐time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.
more » « less- NSF-PAR ID:
- 10456220
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Biometrical Journal
- Volume:
- 62
- Issue:
- 6
- ISSN:
- 0323-3847
- Page Range / eLocation ID:
- p. 1371-1393
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Summary -
Patients' longitudinal biomarker changing patterns are crucial factors for their disease progression. In this research, we apply functional principal component analysis techniques to extract these changing patterns and use them as predictors in landmark models for dynamic prediction. The time‐varying effects of risk factors along a sequence of landmark times are smoothed by a supermodel to borrow information from neighbor time intervals. This results in more stable estimation and more clear demonstration of the time‐varying effects. Compared with the traditional landmark analysis, simulation studies show our proposed approach results in lower prediction error rates and higher area under receiver operating characteristic curve (AUC) values, which indicate better ability to discriminate between subjects with different risk levels. We apply our method to data from the Framingham Heart Study, using longitudinal total cholesterol (TC) levels to predict future coronary heart disease (CHD) risk profiles. Our approach not only obtains the overall trend of biomarker‐related risk profiles, but also reveals different risk patterns that are not available from the traditional landmark analyses. Our results show that high cholesterol levels during young ages are more harmful than those in old ages. This demonstrates the importance of analyzing the age‐dependent effects of TC on CHD risk.
-
Most studies characterize longitudinal biomarker trajectories by looking forward at them from a commonly used time origin, such as the initial treatment time. For a better understanding of the relationship between biomarkers and disease progression, we propose to align all subjects by using their disease progression time as the origin and then looking backward at the biomarker distributions prior to that event. We demonstrate that such backward‐looking plots are much more informative than forward‐looking plots when the research goal is to understand the shape of the trajectory leading up to the event of interest. Such backward‐looking plotting is an easy task if disease progression is observed for all the subjects. However, when these events are censored for a significant proportion of subjects in the study cohort, their time origins cannot be identified, and the task of aligning them cannot be performed. We propose a new method to tackle this problem by considering the distributions of longitudinal biomarker data conditional on the failure time. We use landmark analysis models to estimate these distributions. Compared to a naïve method, our new method greatly reduces estimation bias. We apply our method to a study for chronic myeloid leukemia patients whose BCR‐ABL transcript expression levels after treatment are good indicators of residual disease. Our proposed method provides a good visualization tool for longitudinal biomarker studies for the early detection of disease.
-
null (Ed.)Abstract Accurate prediction of suicide risk among children and adolescents within an actionable time frame is an important but challenging task. Very few studies have comprehensively considered the clinical risk factors available to produce quantifiable risk scores for estimation of short- and long-term suicide risk for pediatric population. In this paper, we built machine learning models for predicting suicidal behavior among children and adolescents based on their longitudinal clinical records, and determining short- and long-term risk factors. This retrospective study used deidentified structured electronic health records (EHR) from the Connecticut Children’s Medical Center covering the period from 1 October 2011 to 30 September 2016. Clinical records of 41,721 young patients (10–18 years old) were included for analysis. Candidate predictors included demographics, diagnosis, laboratory tests, and medications. Different prediction windows ranging from 0 to 365 days were adopted. For each prediction window, candidate predictors were first screened by univariate statistical tests, and then a predictive model was built via a sequential forward feature selection procedure. We grouped the selected predictors and estimated their contributions to risk prediction at different prediction window lengths. The developed predictive models predicted suicidal behavior across all prediction windows with AUCs varying from 0.81 to 0.86. For all prediction windows, the models detected 53–62% of suicide-positive subjects with 90% specificity. The models performed better with shorter prediction windows and predictor importance varied across prediction windows, illustrating short- and long-term risks. Our findings demonstrated that routinely collected EHRs can be used to create accurate predictive models for suicide risk among children and adolescents.more » « less
-
Alzheimer's Disease (AD) is a chronic neurodegenerative disease that severely impacts patients' thinking, memory and behavior. To aid automatic AD diagnoses, many longitudinal learning models have been proposed to predict clinical outcomes and/or disease status, which, though, often fail to consider missing temporal phenotypic records of the patients that can convey valuable information of AD progressions. Another challenge in AD studies is how to integrate heterogeneous genotypic and phenotypic biomarkers to improve diagnosis prediction. To cope with these challenges, in this paper we propose a longitudinal multi-modal method to learn enriched genotypic and phenotypic biomarker representations in the format of fixed-length vectors that can simultaneously capture the baseline neuroimaging measurements of the entire dataset and progressive variations of the varied counts of follow-up measurements over time of every participant from different biomarker sources. The learned global and local projections are aligned by a soft constraint and the structured-sparsity norm is used to uncover the multi-modal structure of heterogeneous biomarker measurements. While the proposed objective is clearly motivated to characterize the progressive information of AD developments, it is a nonsmooth objective that is difficult to efficiently optimize in general. Thus, we derive an efficient iterative algorithm, whose convergence is rigorously guaranteed in mathematics. We have conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) data using one genotypic and two phenotypic biomarkers. Empirical results have demonstrated that the learned enriched biomarker representations are more effective in predicting the outcomes of various cognitive assessments. Moreover, our model has successfully identified disease-relevant biomarkers supported by existing medical findings that additionally warrant the correctness of our method from the clinical perspective.more » « less