skip to main content


Title: A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
Abstract Objective Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. Materials and Methods We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. Results We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. Discussion and Conclusion Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression.  more » « less
Award ID(s):
2054253 2205441
NSF-PAR ID:
10404732
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
JAMIA Open
Volume:
6
Issue:
1
ISSN:
2574-2531
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Importance

    Body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) is a commonly used estimate of obesity, which is a complex trait affected by genetic and lifestyle factors. Marked weight gain and loss could be associated with adverse biological processes.

    Objective

    To evaluate the association between BMI variability and incident cardiovascular disease (CVD) events in 2 distinct cohorts.

    Design, Setting, and Participants

    This cohort study used data from the Million Veteran Program (MVP) between 2011 and 2018 and participants in the UK Biobank (UKB) enrolled between 2006 and 2010. Participants were followed up for a median of 3.8 (5th-95th percentile, 3.5) years. Participants with baseline CVD or cancer were excluded. Data were analyzed from September 2022 and September 2023.

    Exposure

    BMI variability was calculated by the retrospective SD and coefficient of variation (CV) using multiple clinical BMI measurements up to the baseline.

    Main Outcomes and Measures

    The main outcome was incident composite CVD events (incident nonfatal myocardial infarction, acute ischemic stroke, and cardiovascular death), assessed using Cox proportional hazards modeling after adjustment for CVD risk factors, including age, sex, mean BMI, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking status, diabetes status, and statin use. Secondary analysis assessed whether associations were dependent on the polygenic score of BMI.

    Results

    Among 92 363 US veterans in the MVP cohort (81 675 [88%] male; mean [SD] age, 56.7 [14.1] years), there were 9695 Hispanic participants, 22 488 non-Hispanic Black participants, and 60 180 non-Hispanic White participants. A total of 4811 composite CVD events were observed from 2011 to 2018. The CV of BMI was associated with 16% higher risk for composite CVD across all groups (hazard ratio [HR], 1.16; 95% CI, 1.13-1.19). These associations were unchanged among subgroups and after adjustment for the polygenic score of BMI. The UKB cohort included 65 047 individuals (mean [SD] age, 57.30 (7.77) years; 38 065 [59%] female) and had 6934 composite CVD events. Each 1-SD increase in BMI variability in the UKB cohort was associated with 8% increased risk of cardiovascular death (HR, 1.08; 95% CI, 1.04-1.11).

    Conclusions and Relevance

    This cohort study found that among US veterans, higher BMI variability was a significant risk marker associated with adverse cardiovascular events independent of mean BMI across major racial and ethnic groups. Results were consistent in the UKB for the cardiovascular death end point. Further studies should investigate the phenotype of high BMI variability.

     
    more » « less
  2. Geographically-based screening policies for diabetic retinopathy (DR) can be effective in developing teleretinal imaging (TRI) guidelines while identifying patients with limited geographic access to eye care. This study conducts cost-effectiveness analysis of different screening policies for urban and rural diabetic patients in Western Pennsylvania. A Monte Carlo simulation model was used to evaluate the cost-effectiveness of 2 standardized screening policies (annual clinic-based screening (ACS) and annual TRI-based screening (ATRI)) and a personalized TRI-based screening policy (PTRI) for both urban and rural cohorts. PTRI was generated by a previously developed mathematical model that autonomously makes semi-annual screening recommendations based on each patient’s disease progression and compliance (Dorali et al. IOVS 2022; 63(7)). For each policy, hypothetical urban and rural cohorts of 50,000 patients were simulated and lifetime QALYs and costs were collected for each patient. TRI compliance rates were derived from electronic medical records. Compliance with clinic-based screening was selected from literature-based values (12-45% for rural patients and 50-65% for urban patients). For a base case urban cohort with an A1C level of 7% and entering age of 40, costs per QALY gain (CPQ) for ACS, ATRI, and PTRI were $744.93±1.57, $792.38±1.64, and $714.60±1.56, respectively; PTRI produced more cost saving than ACS with the same QALY gain (See Fig 1). For a base case rural cohort, CPQ for ACS, ATRI, and PTRI were $869.15±1.80, $819.24±1.88, and $761.51±1.42, respectively; both ATRI and PTRI dominated ACS in QALY gains and cost saving (Fig 1). PTRI recommended TRI more to rural patients (94.13±0.01%) than to urban patients (87.20±0.02%). For the rural cohort, the minimum average TRI compliance rate such that ATRI is more cost-effective than ACS was 56% (Fig 2). TRI-based screening was found more beneficial for rural patients. PTRI was found dominant in QALY gain and cost saving for both urban and rural cohorts against standardized policies. These findings suggest that TRI is best utilized when location-specific factors such as geographic access to care or TRI compliance are considered. 
    more » « less
  3. Glaucoma is a multifactorial disease and a leading cause of irreversible blindness worldwide. Current data has demonstrated the approximate distribution of primary openangle glaucoma (POAG) in patients of European, African, Hispanic, and Eastern Asian descent. However, a significant gap in the literature exists regarding the prevalence of POAG in Middle Eastern (ME) populations. Current studies estimate ME POAG prevalence based on a European model. Herein we screened 65 total publications on ME prevalence of POAG and specific risk factors using keywords: “glaucoma”, “prevalence”, “incidence”, “risk factor”, “Middle East”, “Mideast”, “Persian”, “Far East”, as well as searching by individual ME countries through PubMed, Embase, Ovid, Scopus, and Trip searches with additional reference list searches from relevant articles published up to and including March 1, 2021. Fifty qualifying records were included after 15 studies identified with low statistical power, confounding co-morbid ophthalmic diseases, and funding bias were excluded. Studies of ME glaucoma risk factors that identify chromosomes, familial trend, age/gender, socioeconomic status, lifestyle, intraocular pressure, vascular influences, optic disc hemorrhage, cup-to-disc ratio, blood pressure, obstructive sleep apnea, and diabetes mellitus were included in this systematic review. We conclude that the prevalence of POAG in the ME is likely higher than the prevalence rate that European models suggest, with ME specific risk factors likely playing a role. However, these findings are severely limited by the paucity of population-level data in the ME. Well-designed, longitudinal population-based studies with rigorous inclusion and exclusion criteria are ultimately needed to accurately assess the epidemiology and specific mechanistic risk factors of glaucoma in ME populations. 
    more » « less
  4. Abstract STUDY QUESTION

    To what extent is preconception maternal or paternal coronavirus disease 2019 (COVID-19) vaccination associated with miscarriage incidence?

    SUMMARY ANSWER

    COVID-19 vaccination in either partner at any time before conception is not associated with an increased rate of miscarriage.

    WHAT IS KNOWN ALREADY

    Several observational studies have evaluated the safety of COVID-19 vaccination during pregnancy and found no association with miscarriage, though no study prospectively evaluated the risk of early miscarriage (gestational weeks [GW] <8) in relation to COVID-19 vaccination. Moreover, no study has evaluated the role of preconception vaccination in both male and female partners.

    STUDY DESIGN, SIZE, DURATION

    An Internet-based, prospective preconception cohort study of couples residing in the USA and Canada. We analyzed data from 1815 female participants who conceived during December 2020–November 2022, including 1570 couples with data on male partner vaccination.

    PARTICIPANTS/MATERIALS, SETTING, METHODS

    Eligible female participants were aged 21–45 years and were trying to conceive without use of fertility treatment at enrollment. Female participants completed questionnaires at baseline, every 8 weeks until pregnancy, and during early and late pregnancy; they could also invite their male partners to complete a baseline questionnaire. We collected data on COVID-19 vaccination (brand and date of doses), history of SARS-CoV-2 infection (yes/no and date of positive test), potential confounders (demographic, reproductive, and lifestyle characteristics), and pregnancy status on all questionnaires. Vaccination status was categorized as never (0 doses before conception), ever (≥1 dose before conception), having a full primary sequence before conception, and completing the full primary sequence ≤3 months before conception. These categories were not mutually exclusive. Participants were followed up from their first positive pregnancy test until miscarriage or a censoring event (induced abortion, ectopic pregnancy, loss to follow-up, 20 weeks’ gestation), whichever occurred first. We estimated incidence rate ratios (IRRs) for miscarriage and corresponding 95% CIs using Cox proportional hazards models with GW as the time scale. We used propensity score fine stratification weights to adjust for confounding.

    MAIN RESULTS AND THE ROLE OF CHANCE

    Among 1815 eligible female participants, 75% had received at least one dose of a COVID-19 vaccine by the time of conception. Almost one-quarter of pregnancies resulted in miscarriage, and 75% of miscarriages occurred <8 weeks’ gestation. The propensity score-weighted IRR comparing female participants who received at least one dose any time before conception versus those who had not been vaccinated was 0.85 (95% CI: 0.63, 1.14). COVID-19 vaccination was not associated with increased risk of either early miscarriage (GW: <8) or late miscarriage (GW: 8–19). There was no indication of an increased risk of miscarriage associated with male partner vaccination (IRR = 0.90; 95% CI: 0.56, 1.44).

    LIMITATIONS, REASONS FOR CAUTION

    The present study relied on self-reported vaccination status and infection history. Thus, there may be some non-differential misclassification of exposure status. While misclassification of miscarriage is also possible, the preconception cohort design and high prevalence of home pregnancy testing in this cohort reduced the potential for under-ascertainment of miscarriage. As in all observational studies, residual or unmeasured confounding is possible.

    WIDER IMPLICATIONS OF THE FINDINGS

    This is the first study to evaluate prospectively the relation between preconception COVID-19 vaccination in both partners and miscarriage, with more complete ascertainment of early miscarriages than earlier studies of vaccination. The findings are informative for individuals planning a pregnancy and their healthcare providers.

    STUDY FUNDING/COMPETING INTEREST(S)

    This work was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Institute of Health [R01-HD086742 (PI: L.A.W.); R01-HD105863S1 (PI: L.A.W. and M.L.E.)], the National Institute of Allergy and Infectious Diseases (R03-AI154544; PI: A.K.R.), and the National Science Foundation (NSF-1914792; PI: L.A.W.). The funders had no role in the study design, data collection, analysis and interpretation of data, writing of the report, or the decision to submit the paper for publication. L.A.W. is a fibroid consultant for AbbVie, Inc. She also receives in-kind donations from Swiss Precision Diagnostics (Clearblue home pregnancy tests) and Kindara.com (fertility apps). M.L.E. received consulting fees from Ro, Hannah, Dadi, VSeat, and Underdog, holds stock in Ro, Hannah, Dadi, and Underdog, is a past president of SSMR, and is a board member of SMRU. K.F.H. reports being an investigator on grants to her institution from UCB and Takeda, unrelated to this study. S.H.-D. reports being an investigator on grants to her institution from Takeda, unrelated to this study, and a methods consultant for UCB and Roche for unrelated drugs. The authors report no other relationships or activities that could appear to have influenced the submitted work.

    TRIAL REGISTRATION NUMBER

    N/A.

     
    more » « less
  5. Abstract INTRODUCTION

    Identifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all‐cause dementia (ACD) conversion at 5 years.

    METHODS

    Cox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held‐out data subset.

    RESULTS

    Of 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72–0.74]), and calibration (Brier score 0.18 [95% CI 0.17–0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors.

    DISCUSSION

    EHR‐based prediction model had good performance in identifying 5‐year MCI to ACD conversion and has potential to assist triaging of at‐risk patients.

    Highlights

    Of 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all‐cause dementia within 5 years.

    Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18).

    Age and vascular‐related morbidities were predictors of dementia conversion.

    Synthetic data was comparable to real data in modeling MCI to dementia conversion.

    Key Points

    An electronic health record–based model using demographic and co‐morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all‐cause dementia (ACD) within 5 years.

    Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5‐year conversion from MCI to ACD.

    High body mass index, alcohol abuse, and sleep apnea were protective factors for 5‐year conversion from MCI to ACD.

    Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health‐care data with minimized patient privacy concern that could accelerate scientific discoveries.

     
    more » « less