skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
Abstract Objective Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. Materials and Methods We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. Results We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. Discussion and Conclusion Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression.  more » « less
Award ID(s):
2054253 2205441
PAR ID:
10404732
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
JAMIA Open
Volume:
6
Issue:
1
ISSN:
2574-2531
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Hearing loss has been associated with individual cardiovascular disease (CVD) risk factors and, to a lesser extent, CVD risk metrics. However, these relationships are understudied in clinical populations. We conducted a retrospective study of electronic health records to evaluate the relationship between hearing loss and CVD risk burden. Hearing loss was defined as puretone average (PTA 0.5,1,2,4 ) > 20 dB hearing level (HL). Optimal CVD risk was defined as nondiabetic, nonsmoking, systolic blood pressure (SBP) < 120 and diastolic (D)BP < 80 mm Hg, and total cholesterol < 180 mg/dL. Major CVD risk factors were diabetes, smoking, hypertension, and total cholesterol ≥ 240 mg/dL or statin use. We identified 6332 patients (mean age = 62.96 years; 45.5% male); 64.0% had hearing loss. Sex-stratified logistic regression adjusted for age, noise exposure, hearing aid use, and body mass index examined associations between hearing loss and CVD risk. For males, diabetes, hypertension, smoking, and ≥ 2 major CVD risk factors were associated with hearing loss. For females, diabetes, smoking, and ≥ 2 major CVD risk factors were significant risk factors. Compared to those with no CVD risk factors, there is a higher likelihood of hearing loss in patients with ≥ 2 major CVD risk factors. Future research to better understand sex dependence in the hearing loss-hypertension relationship is indicated. 
    more » « less
  2. ImportanceBody mass index (BMI; calculated as weight in kilograms divided by height in meters squared) is a commonly used estimate of obesity, which is a complex trait affected by genetic and lifestyle factors. Marked weight gain and loss could be associated with adverse biological processes. ObjectiveTo evaluate the association between BMI variability and incident cardiovascular disease (CVD) events in 2 distinct cohorts. Design, Setting, and ParticipantsThis cohort study used data from the Million Veteran Program (MVP) between 2011 and 2018 and participants in the UK Biobank (UKB) enrolled between 2006 and 2010. Participants were followed up for a median of 3.8 (5th-95th percentile, 3.5) years. Participants with baseline CVD or cancer were excluded. Data were analyzed from September 2022 and September 2023. ExposureBMI variability was calculated by the retrospective SD and coefficient of variation (CV) using multiple clinical BMI measurements up to the baseline. Main Outcomes and MeasuresThe main outcome was incident composite CVD events (incident nonfatal myocardial infarction, acute ischemic stroke, and cardiovascular death), assessed using Cox proportional hazards modeling after adjustment for CVD risk factors, including age, sex, mean BMI, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking status, diabetes status, and statin use. Secondary analysis assessed whether associations were dependent on the polygenic score of BMI. ResultsAmong 92 363 US veterans in the MVP cohort (81 675 [88%] male; mean [SD] age, 56.7 [14.1] years), there were 9695 Hispanic participants, 22 488 non-Hispanic Black participants, and 60 180 non-Hispanic White participants. A total of 4811 composite CVD events were observed from 2011 to 2018. The CV of BMI was associated with 16% higher risk for composite CVD across all groups (hazard ratio [HR], 1.16; 95% CI, 1.13-1.19). These associations were unchanged among subgroups and after adjustment for the polygenic score of BMI. The UKB cohort included 65 047 individuals (mean [SD] age, 57.30 (7.77) years; 38 065 [59%] female) and had 6934 composite CVD events. Each 1-SD increase in BMI variability in the UKB cohort was associated with 8% increased risk of cardiovascular death (HR, 1.08; 95% CI, 1.04-1.11). Conclusions and RelevanceThis cohort study found that among US veterans, higher BMI variability was a significant risk marker associated with adverse cardiovascular events independent of mean BMI across major racial and ethnic groups. Results were consistent in the UKB for the cardiovascular death end point. Further studies should investigate the phenotype of high BMI variability. 
    more » « less
  3. Diabetes-related complications reflect longstanding damage to small and large vessels throughout the body. In addition to the duration of diabetes and poor glycemic control, genetic factors are important contributors to the variability in the development of vascular complications. Early heritability studies found strong familial clustering of both macrovascular and microvascular complications. However, they were limited by small sample sizes and large phenotypic heterogeneity, leading to less accurate estimates. We take advantage of two independent studies—UK Biobank and the Action to Control Cardiovascular Risk in Diabetes trial—to survey the single nucleotide polymorphism heritability for diabetes microvascular (diabetic kidney disease and diabetic retinopathy) and macrovascular (cardiovascular events) complications. Heritability for diabetic kidney disease was estimated at 29%. The heritability estimate for microalbuminuria ranged from 24 to 60% and was 41% for macroalbuminuria. Heritability estimates of diabetic retinopathy ranged from 6 to 33%, depending on the phenotype definition. More severe diabetes retinopathy possessed higher genetic contributions. We show, for the first time, that rare variants account for much of the heritability of diabetic retinopathy. This study suggests that a large portion of the genetic risk of diabetes complications is yet to be discovered and emphasizes the need for additional genetic studies of diabetes complications. 
    more » « less
  4. Geographically-based screening policies for diabetic retinopathy (DR) can be effective in developing teleretinal imaging (TRI) guidelines while identifying patients with limited geographic access to eye care. This study conducts cost-effectiveness analysis of different screening policies for urban and rural diabetic patients in Western Pennsylvania. A Monte Carlo simulation model was used to evaluate the cost-effectiveness of 2 standardized screening policies (annual clinic-based screening (ACS) and annual TRI-based screening (ATRI)) and a personalized TRI-based screening policy (PTRI) for both urban and rural cohorts. PTRI was generated by a previously developed mathematical model that autonomously makes semi-annual screening recommendations based on each patient’s disease progression and compliance (Dorali et al. IOVS 2022; 63(7)). For each policy, hypothetical urban and rural cohorts of 50,000 patients were simulated and lifetime QALYs and costs were collected for each patient. TRI compliance rates were derived from electronic medical records. Compliance with clinic-based screening was selected from literature-based values (12-45% for rural patients and 50-65% for urban patients). For a base case urban cohort with an A1C level of 7% and entering age of 40, costs per QALY gain (CPQ) for ACS, ATRI, and PTRI were $744.93±1.57, $792.38±1.64, and $714.60±1.56, respectively; PTRI produced more cost saving than ACS with the same QALY gain (See Fig 1). For a base case rural cohort, CPQ for ACS, ATRI, and PTRI were $869.15±1.80, $819.24±1.88, and $761.51±1.42, respectively; both ATRI and PTRI dominated ACS in QALY gains and cost saving (Fig 1). PTRI recommended TRI more to rural patients (94.13±0.01%) than to urban patients (87.20±0.02%). For the rural cohort, the minimum average TRI compliance rate such that ATRI is more cost-effective than ACS was 56% (Fig 2). TRI-based screening was found more beneficial for rural patients. PTRI was found dominant in QALY gain and cost saving for both urban and rural cohorts against standardized policies. These findings suggest that TRI is best utilized when location-specific factors such as geographic access to care or TRI compliance are considered. 
    more » « less
  5. Accurate prediction and monitoring of patient health in the intensive care unit can inform shared decisions regarding appropriateness of care delivery, risk-reduction strategies, and intensive care resource use. Traditionally, algorithmic solutions for patient outcome prediction rely solely on data available from electronic health records (EHR). In this pilot study, we explore the benefits of augmenting existing EHR data with novel measurements from wrist-worn activity sensors as part of a clinical environment known as the Intelligent ICU. We implemented temporal deep learning models based on two distinct sources of patient data: (1) routinely measured vital signs from electronic health records, and (2) activity data collected from wearable sensors. As a proxy for illness severity, our models predicted whether patients leaving the intensive care unit would be successfully or unsuccessfully discharged from the hospital. We overcome the challenge of small sample size in our prospective cohort by applying deep transfer learning using EHR data from a much larger cohort of traditional ICU patients. Our experiments quantify added utility of non-traditional measurements for predicting patient health, especially when applying a transfer learning procedure to small novel Intelligent ICU cohorts of critically ill patients. 
    more » « less