skip to main content


Title: Functional principal component based landmark analysis for the effects of longitudinal cholesterol profiles on the risk of coronary heart disease

Patients' longitudinal biomarker changing patterns are crucial factors for their disease progression. In this research, we apply functional principal component analysis techniques to extract these changing patterns and use them as predictors in landmark models for dynamic prediction. The time‐varying effects of risk factors along a sequence of landmark times are smoothed by a supermodel to borrow information from neighbor time intervals. This results in more stable estimation and more clear demonstration of the time‐varying effects. Compared with the traditional landmark analysis, simulation studies show our proposed approach results in lower prediction error rates and higher area under receiver operating characteristic curve (AUC) values, which indicate better ability to discriminate between subjects with different risk levels. We apply our method to data from the Framingham Heart Study, using longitudinal total cholesterol (TC) levels to predict future coronary heart disease (CHD) risk profiles. Our approach not only obtains the overall trend of biomarker‐related risk profiles, but also reveals different risk patterns that are not available from the traditional landmark analyses. Our results show that high cholesterol levels during young ages are more harmful than those in old ages. This demonstrates the importance of analyzing the age‐dependent effects of TC on CHD risk.

 
more » « less
NSF-PAR ID:
10454288
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
40
Issue:
3
ISSN:
0277-6715
Format(s):
Medium: X Size: p. 650-667
Size(s):
p. 650-667
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post‐baseline time. A simple solution is the last‐value‐carry‐forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high‐dimensional integrals without a closed‐form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time‐to‐event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real‐time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.

     
    more » « less
  2. Most studies characterize longitudinal biomarker trajectories by looking forward at them from a commonly used time origin, such as the initial treatment time. For a better understanding of the relationship between biomarkers and disease progression, we propose to align all subjects by using their disease progression time as the origin and then looking backward at the biomarker distributions prior to that event. We demonstrate that such backward‐looking plots are much more informative than forward‐looking plots when the research goal is to understand the shape of the trajectory leading up to the event of interest. Such backward‐looking plotting is an easy task if disease progression is observed for all the subjects. However, when these events are censored for a significant proportion of subjects in the study cohort, their time origins cannot be identified, and the task of aligning them cannot be performed. We propose a new method to tackle this problem by considering the distributions of longitudinal biomarker data conditional on the failure time. We use landmark analysis models to estimate these distributions. Compared to a naïve method, our new method greatly reduces estimation bias. We apply our method to a study for chronic myeloid leukemia patients whose BCR‐ABL transcript expression levels after treatment are good indicators of residual disease. Our proposed method provides a good visualization tool for longitudinal biomarker studies for the early detection of disease.

     
    more » « less
  3. Summary

    Predicting patient life expectancy is of great importance for clinicians in making treatment decisions. This prediction needs to be conducted in a dynamic manner, based on longitudinal biomarkers repeatedly measured during the patient's post-treatment follow-up period. The prediction is updated any time a new biomarker measurement is obtained. The heterogeneity across patients of biomarker trajectories over time requires flexible and powerful approaches to model noisy and irregularly measured longitudinal data. In this article, we use functional principal component analysis (FPCA) to extract the dominant features of the biomarker trajectory of each individual, and use these features as time-dependent predictors (covariates) in a transformed mean residual life (MRL) regression model to conduct dynamic prediction. Simulation studies demonstrate the improved performance of the transformed MRL model that includes longitudinal biomarker information in the prediction. We apply the proposed method to predict the remaining time expectancy until disease progression for patients with chronic myeloid leukemia, using the transcript levels of an oncogene, BCR-ABL.

     
    more » « less
  4. The lack of sex-specific cardiovascular disease criteria contributes to the underdiagnosis of women compared to that of men. For more than half a century, the Framingham Risk Score has been the gold standard to estimate an individual’s risk of developing cardiovascular disease based on the age, sex, cholesterol levels, blood pressure, diabetes status, and the smoking status. Now, machine learning can offer a much more nuanced insight into predicting the risk of cardiovascular diseases. The UK Biobank is a large database that includes traditional risk factors and tests related to the cardiovascular system: magnetic resonance imaging, pulse wave analysis, electrocardiograms, and carotid ultrasounds. Here, we leverage 20,542 datasets from the UK Biobank to build more accurate cardiovascular risk models than the Framingham Risk Score and quantify the underdiagnosis of women compared to that of men. Strikingly, for a first-degree atrioventricular block and dilated cardiomyopathy, two conditions with non-sex-specific diagnostic criteria, our study shows that women are under-diagnosed 2× and 1.4× more than men. Similarly, our results demonstrate the need for sex-specific criteria in essential primary hypertension and hypertrophic cardiomyopathy. Our feature importance analysis reveals that out of the top 10 features across three sexes and four disease categories, traditional Framingham factors made up between 40% and 50%; electrocardiogram, 30%–33%; pulse wave analysis, 13%–23%; and magnetic resonance imaging and carotid ultrasound, 0%–10%. Improving the Framingham Risk Score by leveraging big data and machine learning allows us to incorporate a wider range of biomedical data and prediction features, enhance personalization and accuracy, and continuously integrate new data and knowledge, with the ultimate goal to improve accurate prediction, early detection, and early intervention in cardiovascular disease management. Our analysis pipeline and trained classifiers are freely available at https://github.com/LivingMatterLab/CardiovascularDiseaseClassification. 
    more » « less
  5. Congenital heart disease (CHD) affects about 1 in 100 newborns and its causes are multifactorial. In the embryo, blood flow within the heart and vasculature is essential for proper heart development, with abnormal blood flow leading to CHD. Here, we discuss how blood flow (hemodynamics) affects heart development from embryonic to fetal stages, and how abnormal blood flow solely can lead to CHD. We emphasize studies performed using avian models of heart development, because those models allow for hemodynamic interventions, in vivo imaging, and follow up, while they closely recapitulate heart defects observed in humans. We conclude with recommendations on investigations that must be performed to bridge the gaps in understanding how blood flow alone, or together with other factors, contributes to CHD. 
    more » « less