skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Opportunistic detection of type 2 diabetes using deep learning from frontal chest radiographs
Abstract Deep learning (DL) models can harness electronic health records (EHRs) to predict diseases and extract radiologic findings for diagnosis. With ambulatory chest radiographs (CXRs) frequently ordered, we investigated detecting type 2 diabetes (T2D) by combining radiographic and EHR data using a DL model. Our model, developed from 271,065 CXRs and 160,244 patients, was tested on a prospective dataset of 9,943 CXRs. Here we show the model effectively detected T2D with a ROC AUC of 0.84 and a 16% prevalence. The algorithm flagged 1,381 cases (14%) as suspicious for T2D. External validation at a distinct institution yielded a ROC AUC of 0.77, with 5% of patients subsequently diagnosed with T2D. Explainable AI techniques revealed correlations between specific adiposity measures and high predictivity, suggesting CXRs’ potential for enhanced T2D screening.  more » « less
Award ID(s):
2205329 2046795
PAR ID:
10616575
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; « less
Publisher / Repository:
Nature Communuications
Date Published:
Journal Name:
Nature Communications
Volume:
14
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this study, we present a method based on Monte Carlo Dropout (MCD) as Bayesian neural network (BNN) approximation for confidence-aware severity classification of lung diseases in COVID-19 patients using chest X-rays (CXRs). Trained and tested on 1208 CXRs from Hospital 1 in the USA, the model categorizes severity into four levels (i.e., normal, mild, moderate, and severe) based on lung consolidation and opacity. Severity labels, determined by the median consensus of five radiologists, serve as the reference standard. The model’s performance is internally validated against evaluations from an additional radiologist and two residents that were excluded from the median. The performance of the model is further evaluated on additional internal and external datasets comprising 2200 CXRs from the same hospital and 1300 CXRs from Hospital 2 in South Korea. The model achieves an average area under the curve (AUC) of 0.94 ± 0.01 across all classes in the primary dataset, surpassing human readers in each severity class and achieves a higher Kendall correlation coefficient (KCC) of 0.80 ± 0.03. The performance of the model is consistent across varied datasets, highlighting its generalization. A key aspect of the model is its predictive uncertainty (PU), which is inversely related to the level of agreement among radiologists, particularly in mild and moderate cases. The study concludes that the model outperforms human readers in severity assessment and maintains consistent accuracy across diverse datasets. Its ability to provide confidence measures in predictions is pivotal for potential clinical use, underscoring the BNN’s role in enhancing diagnostic precision in lung disease analysis through CXR. 
    more » « less
  2. Background: At the time of cancer diagnosis, it is crucial to accurately classify malignant gastric tumors and the possibility that patients will survive. Objective: This study aims to investigate the feasibility of identifying and applying a new feature extraction technique to predict the survival of gastric cancer patients. Methods: A retrospective dataset including the computed tomography (CT) images of 135 patients was assembled. Among them, 68 patients survived longer than three years. Several sets of radiomics features were extracted and were incorporated into a machine learning model, and their classification performance was characterized. To improve the classification performance, we further extracted another 27 texture and roughness parameters with 2484 superficial and spatial features to propose a new feature pool. This new feature set was added into the machine learning model and its performance was analyzed. To determine the best model for our experiment, Random Forest (RF) classifier, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) (four of the most popular machine learning models) were utilized. The models were trained and tested using the five-fold cross-validation method. Results: Using the area under ROC curve (AUC) as an evaluation index, the model that was generated using the new feature pool yields AUC = 0.98 ± 0.01, which was significantly higher than the models created using the traditional radiomics feature set (p < 0.04). RF classifier performed better than the other machine learning models. Conclusions: This study demonstrated that although radiomics features produced good classification performance, creating new feature sets significantly improved the model performance. 
    more » « less
  3. OBJECTIVETo determine the benefit of starting continuous glucose monitoring (CGM) in adult-onset type 1 diabetes (T1D) and type 2 diabetes (T2D) with regard to longer-term glucose control and serious clinical events. RESEARCH DESIGN AND METHODSA retrospective observational cohort study within the Veterans Affairs Health Care System was used to compare glucose control and hypoglycemia- or hyperglycemia-related admission to an emergency room or hospital and all-cause hospitalization between propensity score overlap weighted initiators of CGM and nonusers over 12 months. RESULTSCGM users receiving insulin (n = 5,015 with T1D and n = 15,706 with T2D) and similar numbers of nonusers were identified from 1 January 2015 to 31 December 2020. Declines in HbA1c were significantly greater in CGM users with T1D (−0.26%; 95% CI −0.33, −0.19%) and T2D (−0.35%; 95% CI −0.40, −0.31%) than in nonusers at 12 months. Percentages of patients achieving HbA1c <8 and <9% after 12 months were greater in CGM users. In T1D, CGM initiation was associated with significantly reduced risk of hypoglycemia (hazard ratio [HR] 0.69; 95% CI 0.48, 0.98) and all-cause hospitalization (HR 0.75; 95% CI 0.63, 0.90). In patients with T2D, there was a reduction in risk of hyperglycemia in CGM users (HR 0.87; 95% CI 0.77, 0.99) and all-cause hospitalization (HR 0.89; 95% CI 0.83, 0.97). Several subgroups (based on baseline age, HbA1c, hypoglycemic risk, or follow-up CGM use) had even greater responses. CONCLUSIONSIn a large national cohort, initiation of CGM was associated with sustained improvement in HbA1c in patients with later-onset T1D and patients with T2D using insulin. This was accompanied by a clear pattern of reduced risk of admission to an emergency room or hospital for hypoglycemia or hyperglycemia and of all-cause hospitalization. 
    more » « less
  4. Abstract Objective Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. Materials and Methods We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. Results We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. Discussion and Conclusion Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression. 
    more » « less
  5. ackground: Pull Request (PR) Integrators often face challenges in terms of multiple concurrent PRs, so the ability to gauge which of the PRs will get accepted can help them balance their workload. PR creators would benefit from knowing if certain characteristics of their PRs may increase the chances of acceptance. Aim: We modeled the probability that a PR will be accepted within a month after creation using a Random Forest model utilizing 50 predictors representing properties of the author, PR, and the project to which PR is submitted. Method: 483,988 PRs from 4218 popular NPM packages were analysed and we selected a subset of 14 predictors sufficient for a tuned Random Forest model to reach high accuracy. Result: An AUC-ROC value of 0.95 was achieved predicting PR acceptance. The model excluding PR properties that change after submission gave an AUC-ROC value of 0.89. We tested the utility of our model in practical scenarios by training it with historical data for the NPM package \textit{bootstrap} and predicting if the PRs submitted in future will be accepted. This gave us an AUC-ROC value of 0.94 with all 14 predictors, and 0.77 excluding PR properties that change after its creation. Conclusion: PR integrators can use our model for a highly accurate assessment of the quality of the open PRs and PR creators may benefit from the model by understanding which characteristics of their PRs may be undesirable from the integrators' perspective. The model can be implemented as a tool, which we plan to do as a future work 
    more » « less