skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fair prediction of 2-year stroke risk in patients with atrial fibrillation
Abstract ObjectiveThis study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups. Materials and MethodsOur study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation. ResultsCompared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach. DiscussionModeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice. ConclusionsOur research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.  more » « less
Award ID(s):
2054346
PAR ID:
10520703
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the American Medical Informatics Association
Volume:
31
Issue:
12
ISSN:
1067-5027
Format(s):
Medium: X Size: p. 2820-2828
Size(s):
p. 2820-2828
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract AimsTo develop machine‐learning algorithms for predicting the risk of a hospitalization or emergency department (ED) visit for opioid use disorder (OUD) (i.e. OUD acute events) in Pennsylvania Medicaid enrollees in the Opioid Use Disorder Centers of Excellence (COE) program and to evaluate the fairness of model performance across racial groups. MethodsWe studied 20 983 United States Medicaid enrollees aged 18 years or older who had COE visits between April 2019 and March 2021. We applied multivariate logistic regression, least absolute shrinkage and selection operator models, random forests, and eXtreme Gradient Boosting (XGB), to predict OUD acute events following the initial COE visit. Our models included predictors at the system, patient, and regional levels. We assessed model performance using multiple metrics by racial groups. Individuals were divided into a low, medium and high‐risk group based on predicted risk scores. ResultsThe training (n = 13 990) and testing (n = 6993) samples displayed similar characteristics (mean age 38.1 ± 9.3 years, 58% male, 80% White enrollees) with 4% experiencing OUD acute events at baseline. XGB demonstrated the best prediction performance (C‐statistic = 76.6% [95% confidence interval = 75.6%–77.7%] vs. 72.8%–74.7% for other methods). At the balanced cutoff, XGB achieved a sensitivity of 68.2%, specificity of 70.0%, and positive predictive value of 8.3%. The XGB model classified the testing sample into high‐risk (6%), medium‐risk (30%), and low‐risk (63%) groups. In the high‐risk group, 40.7% had OUD acute events vs. 16.5% and 5.0% in the medium‐ and low‐risk groups. The high‐ and medium‐risk groups captured 44% and 26% of individuals with OUD events. The XGB model exhibited lower false negative rates and higher false positive rates in racial/ethnic minority groups than White enrollees. ConclusionsNew machine‐learning algorithms perform well to predict risks of opioid use disorder (OUD) acute care use among United States Medicaid enrollees and improve fairness of prediction across racial and ethnic groups compared with previous OUD‐related models. 
    more » « less
  2. IntroductionPredictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis. MethodsThis is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound. ResultsWe developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. ConclusionMachine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary. 
    more » « less
  3. BackgroundRisk-based screening for lung cancer is currently being considered in several countries; however, the optimal approach to determine eligibility remains unclear. Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models while maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening. Methods and findingsFor model development, we used data from 216,714 ever-smokers recruited between 2006 and 2010 to the UK Biobank prospective cohort and 26,616 high-risk ever-smokers recruited between 2002 and 2004 to the control arm of the US National Lung Screening (NLST) randomised controlled trial. The NLST trial randomised high-risk smokers from 33 US centres with at least a 30 pack-year smoking history and fewer than 15 quit-years to annual CT or chest radiography screening for lung cancer. We externally validated our models among 49,593 participants in the chest radiography arm and all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian (PLCO) Screening Trial. The PLCO trial, recruiting from 1993 to 2001, analysed the impact of chest radiography or no chest radiography for lung cancer screening. We primarily validated in the PLCO chest radiography arm such that we could benchmark against comparator models developed within the PLCO control arm. Models were developed to predict the risk of 2 outcomes within 5 years from baseline: diagnosis of lung cancer and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis.Models predicting lung cancer death (UCL-D) and incidence (UCL-I) using 3 variables—age, smoking duration, and pack-years—achieved or exceeded parity in discrimination, overall performance, and net benefit with comparators currently in use, despite requiring only one-quarter of the predictors. In external validation in the PLCO trial, UCL-D had an AUC of 0.803 (95% CI: 0.783, 0.824) and was well calibrated with an expected/observed (E/O) ratio of 1.05 (95% CI: 0.95, 1.19). UCL-I had an AUC of 0.787 (95% CI: 0.771, 0.802), an E/O ratio of 1.0 (95% CI: 0.92, 1.07). The sensitivity of UCL-D was 85.5% and UCL-I was 83.9%, at 5-year risk thresholds of 0.68% and 1.17%, respectively, 7.9% and 6.2% higher than the USPSTF-2021 criteria at the same specificity. The main limitation of this study is that the models have not been validated outside of UK and US cohorts. ConclusionsWe present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings. 
    more » « less
  4. BackgroundStroke therapy is essential to reduce impairments and improve motor movements by engaging autogenous neuroplasticity. Traditionally, stroke rehabilitation occurs in inpatient and outpatient rehabilitation facilities. However, recent literature increasingly explores moving the recovery process into the home and integrating technology-based interventions. This study advances this goal by promoting in-home, autonomous recovery for patients who experienced a stroke through robotics-assisted rehabilitation and classifying stroke residual severity using machine learning methods. ObjectiveOur main objective is to use kinematics data collected during in-home, self-guided therapy sessions to develop supervised machine learning methods, to address a clinician’s autonomous classification of stroke residual severity–labeled data toward improving in-home, robotics-assisted stroke rehabilitation. MethodsIn total, 33 patients who experienced a stroke participated in in-home therapy sessions using Motus Nova robotics rehabilitation technology to capture upper and lower body motion. During each therapy session, the Motus Hand and Motus Foot devices collected movement data, assistance data, and activity-specific data. We then synthesized, processed, and summarized these data. Next, the therapy session data were paired with clinician-informed, discrete stroke residual severity labels: “no range of motion (ROM),” “low ROM,” and “high ROM.” Afterward, an 80%:20% split was performed to divide the dataset into a training set and a holdout test set. We used 4 machine learning algorithms to classify stroke residual severity: light gradient boosting (LGB), extra trees classifier, deep feed-forward neural network, and classical logistic regression. We selected models based on 10-fold cross-validation and measured their performance on a holdout test dataset using F1-score to identify which model maximizes stroke residual severity classification accuracy. ResultsWe demonstrated that the LGB method provides the most reliable autonomous detection of stroke severity. The trained model is a consensus model that consists of 139 decision trees with up to 115 leaves each. This LGB model boasts a 96.70% F1-score compared to logistic regression (55.82%), extra trees classifier (94.81%), and deep feed-forward neural network (70.11%). ConclusionsWe showed how objectively measured rehabilitation training paired with machine learning methods can be used to identify the residual stroke severity class, with efforts to enhance in-home self-guided, individualized stroke rehabilitation. The model we trained relies only on session summary statistics, meaning it can potentially be integrated into similar settings for real-time classification, such as outpatient rehabilitation facilities. 
    more » « less
  5. BackgroundMaternal loneliness is associated with adverse physical and mental health outcomes for both the mother and her child. Detecting maternal loneliness noninvasively through wearable devices and passive sensing provides opportunities to prevent or reduce the impact of loneliness on the health and well-being of the mother and her child. ObjectiveThe aim of this study is to use objective health data collected passively by a wearable device to predict maternal (social) loneliness during pregnancy and the postpartum period and identify the important objective physiological parameters in loneliness detection. MethodsWe conducted a longitudinal study using smartwatches to continuously collect physiological data from 31 women during pregnancy and the postpartum period. The participants completed the University of California, Los Angeles (UCLA) loneliness questionnaire in gestational week 36 and again at 12 weeks post partum. Responses to this questionnaire and background information of the participants were collected through our customized cross-platform mobile app. We leveraged participants’ smartwatch data from the 7 days before and the day of their completion of the UCLA questionnaire for loneliness prediction. We categorized the loneliness scores from the UCLA questionnaire as loneliness (scores≥12) and nonloneliness (scores<12). We developed decision tree and gradient-boosting models to predict loneliness. We evaluated the models by using leave-one-participant-out cross-validation. Moreover, we discussed the importance of extracted health parameters in our models for loneliness prediction. ResultsThe gradient boosting and decision tree models predicted maternal social loneliness with weighted F1-scores of 0.897 and 0.872, respectively. Our results also show that loneliness is highly associated with activity intensity and activity distribution during the day. In addition, resting heart rate (HR) and resting HR variability (HRV) were correlated with loneliness. ConclusionsOur results show the potential benefit and feasibility of using passive sensing with a smartwatch to predict maternal loneliness. Our developed machine learning models achieved a high F1-score for loneliness prediction. We also show that intensity of activity, activity pattern, and resting HR and HRV are good predictors of loneliness. These results indicate the intervention opportunities made available by wearable devices and predictive models to improve maternal well-being through early detection of loneliness. 
    more » « less