skip to main content


Title: Mean Residual Life Regression with Functional Principal Component Analysis on Longitudinal Data for Dynamic Prediction
Summary

Predicting patient life expectancy is of great importance for clinicians in making treatment decisions. This prediction needs to be conducted in a dynamic manner, based on longitudinal biomarkers repeatedly measured during the patient's post-treatment follow-up period. The prediction is updated any time a new biomarker measurement is obtained. The heterogeneity across patients of biomarker trajectories over time requires flexible and powerful approaches to model noisy and irregularly measured longitudinal data. In this article, we use functional principal component analysis (FPCA) to extract the dominant features of the biomarker trajectory of each individual, and use these features as time-dependent predictors (covariates) in a transformed mean residual life (MRL) regression model to conduct dynamic prediction. Simulation studies demonstrate the improved performance of the transformed MRL model that includes longitudinal biomarker information in the prediction. We apply the proposed method to predict the remaining time expectancy until disease progression for patients with chronic myeloid leukemia, using the transcript levels of an oncogene, BCR-ABL.

 
more » « less
NSF-PAR ID:
10485922
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
74
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1482-1491
Size(s):
p. 1482-1491
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post‐baseline time. A simple solution is the last‐value‐carry‐forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high‐dimensional integrals without a closed‐form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time‐to‐event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real‐time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.

     
    more » « less
  2. Objective

    To develop and validate an accurate, usable prediction model for other‐cause mortality (OCM) in patients with prostate cancer diagnosed in the United States.

    Materials and Methods

    Model training was performed using the National Health and Nutrition Examination Survey 1999–2010 including men aged >40 years with follow‐up to the year 2014. The model was validated in the Prostate, Lung, Colon, and Ovarian Cancer Screening Trial prostate cancer cohort, which enrolled patients between 1993 and 2001 with follow‐up to the year 2015. Time‐dependent area under the curve (AUC) and calibration were assessed in the validation cohort. Analyses were performed to assess algorithmic bias.

    Results

    The 2420 patient training cohort had 459 deaths over a median follow‐up of 8.8 years among survivors. The final model included eight predictors: age; education; marital status; diabetes; hypertension; stroke; body mass index; and smoking. It had an AUC of 0.75 at 10 years for predicting OCM in the validation cohort of 8220 patients. The final model significantly outperformed the Social Security Administration life tables and showed adequate predictive performance across race, educational attainment, and marital status subgroups. There is evidence of major variability in life expectancy that is not captured by age, with life expectancy predictions differing by 10 or more years among patients of the same age.

    Conclusion

    Using two national cohorts, we have developed and validated a simple and useful prediction model for OCM for patients with prostate cancer treated in the United States, which will allow for more personalized treatment in accordance with guidelines.

     
    more » « less
  3. Patients' longitudinal biomarker changing patterns are crucial factors for their disease progression. In this research, we apply functional principal component analysis techniques to extract these changing patterns and use them as predictors in landmark models for dynamic prediction. The time‐varying effects of risk factors along a sequence of landmark times are smoothed by a supermodel to borrow information from neighbor time intervals. This results in more stable estimation and more clear demonstration of the time‐varying effects. Compared with the traditional landmark analysis, simulation studies show our proposed approach results in lower prediction error rates and higher area under receiver operating characteristic curve (AUC) values, which indicate better ability to discriminate between subjects with different risk levels. We apply our method to data from the Framingham Heart Study, using longitudinal total cholesterol (TC) levels to predict future coronary heart disease (CHD) risk profiles. Our approach not only obtains the overall trend of biomarker‐related risk profiles, but also reveals different risk patterns that are not available from the traditional landmark analyses. Our results show that high cholesterol levels during young ages are more harmful than those in old ages. This demonstrates the importance of analyzing the age‐dependent effects of TC on CHD risk.

     
    more » « less
  4. Most studies characterize longitudinal biomarker trajectories by looking forward at them from a commonly used time origin, such as the initial treatment time. For a better understanding of the relationship between biomarkers and disease progression, we propose to align all subjects by using their disease progression time as the origin and then looking backward at the biomarker distributions prior to that event. We demonstrate that such backward‐looking plots are much more informative than forward‐looking plots when the research goal is to understand the shape of the trajectory leading up to the event of interest. Such backward‐looking plotting is an easy task if disease progression is observed for all the subjects. However, when these events are censored for a significant proportion of subjects in the study cohort, their time origins cannot be identified, and the task of aligning them cannot be performed. We propose a new method to tackle this problem by considering the distributions of longitudinal biomarker data conditional on the failure time. We use landmark analysis models to estimate these distributions. Compared to a naïve method, our new method greatly reduces estimation bias. We apply our method to a study for chronic myeloid leukemia patients whose BCR‐ABL transcript expression levels after treatment are good indicators of residual disease. Our proposed method provides a good visualization tool for longitudinal biomarker studies for the early detection of disease.

     
    more » « less
  5. Abstract Objective

    This study aims to establish an informative dynamic prediction model of treatment outcomes using follow-up records of tuberculosis (TB) patients, which can timely detect cases when the current treatment plan may not be effective.

    Materials and Methods

    We used 122 267 follow-up records from 17 958 new cases of pulmonary TB in the Republic of Moldova. A dynamic prediction framework integrating landmark modeling and machine learning algorithms was designed to predict patient outcomes during the course of treatment. Sensitivity and positive predictive value (PPV) were calculated to evaluate performance of the model at critical time points. New measures were defined to determine when follow-up laboratory tests should be conducted to obtain most informative results.

    Results

    The random-forest algorithm performed better than support vector machine and penalized multinomial logistic regression models for predicting TB treatment outcomes. For all 3 outcome classes (ie, cured, not cured, and died after 24 months following treatment initiation), sensitivity and PPV of prediction models improved as more follow-up information was collected. Specifically, sensitivity and PPV increased from 0.55 to 0.84 and from 0.32 to 0.88, respectively, for the not cured class.

    Conclusion

    The dynamic prediction framework utilizes longitudinal laboratory test results to predict patient outcomes at various landmarks. Sputum culture and smear results are among the important variables for prediction; however, the most recent sputum result is not always the most informative one. This framework can potentially facilitate a more effective treatment monitoring program and provide insights for policymakers toward improved guidelines on follow-up tests.

     
    more » « less