skip to main content


Title: Social determinants of health and the prediction of missed breast imaging appointments
Abstract Background Predictive models utilizing social determinants of health (SDH), demographic data, and local weather data were trained to predict missed imaging appointments (MIA) among breast imaging patients at the Boston Medical Center (BMC). Patients were characterized by many different variables, including social needs, demographics, imaging utilization, appointment features, and weather conditions on the date of the appointment. Methods This HIPAA compliant retrospective cohort study was IRB approved. Informed consent was waived. After data preprocessing steps, the dataset contained 9,970 patients and 36,606 appointments from 1/1/2015 to 12/31/2019. We identified 57 potentially impactful variables used in the initial prediction model and assessed each patient for MIA. We then developed a parsimonious model via recursive feature elimination, which identified the 25 most predictive variables. We utilized linear and non-linear models including support vector machines (SVM), logistic regression (LR), and random forest (RF) to predict MIA and compared their performance. Results The highest-performing full model is the nonlinear RF, achieving the highest Area Under the ROC Curve (AUC) of 76% and average F1 score of 85%. Models limited to the most predictive variables were able to attain AUC and F1 scores comparable to models with all variables included. The variables most predictive of missed appointments included timing, prior appointment history, referral department of origin, and socioeconomic factors such as household income and access to caregiving services. Conclusions Prediction of MIA with the data available is inherently limited by the complex, multifactorial nature of MIA. However, the algorithms presented achieved acceptable performance and demonstrated that socioeconomic factors were useful predictors of MIA. In contrast with non-modifiable demographic factors, we can address SDH to decrease the incidence of MIA.  more » « less
Award ID(s):
2200052 1914792 1664644
NSF-PAR ID:
10424876
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
BMC Health Services Research
Volume:
22
Issue:
1
ISSN:
1472-6963
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective

    To develop predictive models of coronavirus disease 2019 (COVID-19) outcomes, elucidate the influence of socioeconomic factors, and assess algorithmic racial fairness using a racially diverse patient population with high social needs.

    Materials and Methods

    Data included 7,102 patients with positive (RT-PCR) severe acute respiratory syndrome coronavirus 2 test at a safety-net system in Massachusetts. Linear and nonlinear classification methods were applied. A score based on a recurrent neural network and a transformer architecture was developed to capture the dynamic evolution of vital signs. Combined with patient characteristics, clinical variables, and hospital occupancy measures, this dynamic vital score was used to train predictive models.

    Results

    Hospitalizations can be predicted with an area under the receiver-operating characteristic curve (AUC) of 92% using symptoms, hospital occupancy, and patient characteristics, including social determinants of health. Parsimonious models to predict intensive care, mechanical ventilation, and mortality that used the most recent labs and vitals exhibited AUCs of 92.7%, 91.2%, and 94%, respectively. Early predictive models, using labs and vital signs closer to admission had AUCs of 81.1%, 84.9%, and 92%, respectively.

    Discussion

    The most accurate models exhibit racial bias, being more likely to falsely predict that Black patients will be hospitalized. Models that are only based on the dynamic vital score exhibited accuracies close to the best parsimonious models, although the latter also used laboratories.

    Conclusions

    This large study demonstrates that COVID-19 severity may accurately be predicted using a score that accounts for the dynamic evolution of vital signs. Further, race, social determinants of health, and hospital occupancy play an important role.

     
    more » « less
  2. Abstract

    The strain on healthcare resources brought forth by the recent COVID-19 pandemic has highlighted the need for efficient resource planning and allocation through the prediction of future consumption. Machine learning can predict resource utilization such as the need for hospitalization based on past medical data stored in electronic medical records (EMR). We conducted this study on 3194 patients (46% male with mean age 56.7 (±16.8), 56% African American, 7% Hispanic) flagged as COVID-19 positive cases in 12 centers under Emory Healthcare network from February 2020 to September 2020, to assess whether a COVID-19 positive patient’s need for hospitalization can be predicted at the time of RT-PCR test using the EMR data prior to the test. Five main modalities of EMR, i.e., demographics, medication, past medical procedures, comorbidities, and laboratory results, were used as features for predictive modeling, both individually and fused together using late, middle, and early fusion. Models were evaluated in terms of precision, recall, F1-score (within 95% confidence interval). The early fusion model is the most effective predictor with 84% overall F1-score [CI 82.1–86.1]. The predictive performance of the model drops by 6 % when using recent clinical data while omitting the long-term medical history. Feature importance analysis indicates that history of cardiovascular disease, emergency room visits in the past year prior to testing, and demographic factors are predictive of the disease trajectory. We conclude that fusion modeling using medical history and current treatment data can forecast the need for hospitalization for patients infected with COVID-19 at the time of the RT-PCR test.

     
    more » « less
  3. Abstract INTRODUCTION

    Identifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all‐cause dementia (ACD) conversion at 5 years.

    METHODS

    Cox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held‐out data subset.

    RESULTS

    Of 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72–0.74]), and calibration (Brier score 0.18 [95% CI 0.17–0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors.

    DISCUSSION

    EHR‐based prediction model had good performance in identifying 5‐year MCI to ACD conversion and has potential to assist triaging of at‐risk patients.

    Highlights

    Of 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all‐cause dementia within 5 years.

    Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18).

    Age and vascular‐related morbidities were predictors of dementia conversion.

    Synthetic data was comparable to real data in modeling MCI to dementia conversion.

    Key Points

    An electronic health record–based model using demographic and co‐morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all‐cause dementia (ACD) within 5 years.

    Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5‐year conversion from MCI to ACD.

    High body mass index, alcohol abuse, and sleep apnea were protective factors for 5‐year conversion from MCI to ACD.

    Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health‐care data with minimized patient privacy concern that could accelerate scientific discoveries.

     
    more » « less
  4. Background

    Although conventional prediction models for surgical patients often ignore intraoperative time-series data, deep learning approaches are well-suited to incorporate time-varying and non-linear data with complex interactions. Blood lactate concentration is one important clinical marker that can reflect the adequacy of systemic perfusion during cardiac surgery. During cardiac surgery and cardiopulmonary bypass, minute-level data is available on key parameters that affect perfusion. The goal of this study was to use machine learning and deep learning approaches to predict maximum blood lactate concentrations after cardiac surgery. We hypothesized that models using minute-level intraoperative data as inputs would have the best predictive performance.

    Methods

    Adults who underwent cardiac surgery with cardiopulmonary bypass were eligible. The primary outcome was maximum lactate concentration within 24 h postoperatively. We considered three classes of predictive models, using the performance metric of mean absolute error across testing folds: (1) static models using baseline preoperative variables, (2) augmentation of the static models with intraoperative statistics, and (3) a dynamic approach that integrates preoperative variables with intraoperative time series data.

    Results

    2,187 patients were included. For three models that only used baseline characteristics (linear regression, random forest, artificial neural network) to predict maximum postoperative lactate concentration, the prediction error ranged from a median of 2.52 mmol/L (IQR 2.46, 2.56) to 2.58 mmol/L (IQR 2.54, 2.60). The inclusion of intraoperative summary statistics (including intraoperative lactate concentration) improved model performance, with the prediction error ranging from a median of 2.09 mmol/L (IQR 2.04, 2.14) to 2.12 mmol/L (IQR 2.06, 2.16). For two modelling approaches (recurrent neural network, transformer) that can utilize intraoperative time-series data, the lowest prediction error was obtained with a range of median 1.96 mmol/L (IQR 1.87, 2.05) to 1.97 mmol/L (IQR 1.92, 2.05). Intraoperative lactate concentration was the most important predictive feature based on Shapley additive values. Anemia and weight were also important predictors, but there was heterogeneity in the importance of other features.

    Conclusion

    Postoperative lactate concentrations can be predicted using baseline and intraoperative data with moderate accuracy. These results reflect the value of intraoperative data in the prediction of clinically relevant outcomes to guide perioperative management.

     
    more » « less
  5. Abstract Objective

    Anterior temporal lobectomy (ATL) is a widely performed and successful intervention for drug‐resistant temporal lobe epilepsy (TLE). However, up to one third of patients experience seizure recurrence within 1 year after ATL. Despite the extensive literature on presurgical electroencephalography (EEG) and magnetic resonance imaging (MRI) abnormalities to prognosticate seizure freedom following ATL, the value of quantitative analysis of visually reviewed normal interictal EEG in such prognostication remains unclear. In this retrospective multicenter study, we investigate whether machine learning analysis of normal interictal scalp EEG studies can inform the prediction of postoperative seizure freedom outcomes in patients who have undergone ATL.

    Methods

    We analyzed normal presurgical scalp EEG recordings from 41 Mayo Clinic (MC) and 23 Cleveland Clinic (CC) patients. We used an unbiased automated algorithm to extract eyes closed awake epochs from scalp EEG studies that were free of any epileptiform activity and then extracted spectral EEG features representing (a) spectral power and (b) interhemispheric spectral coherence in frequencies between 1 and 25 Hz across several brain regions. We analyzed the differences between the seizure‐free and non–seizure‐free patients and employed a Naïve Bayes classifier using multiple spectral features to predict surgery outcomes. We trained the classifier using a leave‐one‐patient‐out cross‐validation scheme within the MC data set and then tested using the out‐of‐sample CC data set. Finally, we compared the predictive performance of normal scalp EEG‐derived features against MRI abnormalities.

    Results

    We found that several spectral power and coherence features showed significant differences correlated with surgical outcomes and that they were most pronounced in the 10–25 Hz range. The Naïve Bayes classification based on those features predicted 1‐year seizure freedom following ATL with area under the curve (AUC) values of 0.78 and 0.76 for the MC and CC data sets, respectively. Subsequent analyses revealed that (a) interhemispheric spectral coherence features in the 10–25 Hz range provided better predictability than other combinations and (b) normal scalp EEG‐derived features provided superior and potentially distinct predictive value when compared with MRI abnormalities (>10% higher F1 score).

    Significance

    These results support that quantitative analysis of even a normal presurgical scalp EEG may help prognosticate seizure freedom following ATL in patients with drug‐resistant TLE. Although the mechanism for this result is not known, the scalp EEG spectral and coherence properties predicting seizure freedom may represent activity arising from the neocortex or the networks responsible for temporal lobe seizure generation within vs outside the margins of an ATL.

     
    more » « less