skip to main content


Title: Predictive Modeling of an Unbalanced Binary Outcome in Food Insecurity Data
Predictive modeling of a rare event using an unbalanced data set leads to poor prediction sensitivity. Although this obstacle is often accompanied by other analytical issues such as a large number of predictors and multicollinearity, little has been done to address these issues simultaneously. The objective of this study is to compare several predictive modeling techniques in this setting. The unbalanced data set is addressed using four resampling methods: undersampling, oversampling, hybrid sampling, and ROSE synthetic data generation. The large number of predictors is addressed using penalized regression methods and ensemble methods. The predictive models are evaluated in terms of sensitivity and F1 score via simulation studies and applied to the prediction of food deserts in North Carolina. Our results show that balancing the data via resampling methods leads to an improved prediction sensitivity for every classifier. The application analysis shows that resampling also leads to an increase in F1 score for every classifier while the simulated data showed that the F1 score tended to decrease slightly in most cases. Our findings may help improve classification performance for unbalanced rare event data in many other applications.  more » « less
Award ID(s):
1735258
NSF-PAR ID:
10109275
Author(s) / Creator(s):
; ;
Publisher / Repository:
Proceedings of the 15th International Conference on Data Science (2019)
Date Published:
Journal Name:
Predictive Modeling of an Unbalanced Binary Outcome in Food Insecurity Data
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction

    Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.

    Methods

    This is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.

    Results

    We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.

    Conclusion

    Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

     
    more » « less
  2. null (Ed.)
    Objective: The objective of the study is to build models for early prediction of risk for developing multiple organ dysfunction (MOD) in pediatric intensive care unit (PICU) patients. Design: The design of the study is a retrospective observational cohort study. Setting: The setting of the study is at a single academic PICU at the Johns Hopkins Hospital, Baltimore, MD. Patients: The patients included in the study were <18 years of age admitted to the PICU between July 2014 and October 2015. Measurements and main results: Organ dysfunction labels were generated every minute from preceding 24-h time windows using the International Pediatric Sepsis Consensus Conference (IPSCC) and Proulx et al. MOD criteria. Early MOD prediction models were built using four machine learning methods: random forest, XGBoost, GLMBoost, and Lasso-GLM. An optimal threshold learned from training data was used to detect high-risk alert events (HRAs). The early prediction models from all methods achieved an area under the receiver operating characteristics curve ≥0.91 for both IPSCC and Proulx criteria. The best performance in terms of maximum F1-score was achieved with random forest (sensitivity: 0.72, positive predictive value: 0.70, F1-score: 0.71) and XGBoost (sensitivity: 0.8, positive predictive value: 0.81, F1-score: 0.81) for IPSCC and Proulx criteria, respectively. The median early warning time was 22.7 h for random forest and 37 h for XGBoost models for IPSCC and Proulx criteria, respectively. Applying spectral clustering on risk-score trajectories over 24 h following early warning provided a high-risk group with ≥0.93 positive predictive value. Conclusions: Early predictions from risk-based patient monitoring could provide more than 22 h of lead time for MOD onset, with ≥0.93 positive predictive value for a high-risk group identified pre-MOD. 
    more » « less
  3. Abstract Objective

    Anterior temporal lobectomy (ATL) is a widely performed and successful intervention for drug‐resistant temporal lobe epilepsy (TLE). However, up to one third of patients experience seizure recurrence within 1 year after ATL. Despite the extensive literature on presurgical electroencephalography (EEG) and magnetic resonance imaging (MRI) abnormalities to prognosticate seizure freedom following ATL, the value of quantitative analysis of visually reviewed normal interictal EEG in such prognostication remains unclear. In this retrospective multicenter study, we investigate whether machine learning analysis of normal interictal scalp EEG studies can inform the prediction of postoperative seizure freedom outcomes in patients who have undergone ATL.

    Methods

    We analyzed normal presurgical scalp EEG recordings from 41 Mayo Clinic (MC) and 23 Cleveland Clinic (CC) patients. We used an unbiased automated algorithm to extract eyes closed awake epochs from scalp EEG studies that were free of any epileptiform activity and then extracted spectral EEG features representing (a) spectral power and (b) interhemispheric spectral coherence in frequencies between 1 and 25 Hz across several brain regions. We analyzed the differences between the seizure‐free and non–seizure‐free patients and employed a Naïve Bayes classifier using multiple spectral features to predict surgery outcomes. We trained the classifier using a leave‐one‐patient‐out cross‐validation scheme within the MC data set and then tested using the out‐of‐sample CC data set. Finally, we compared the predictive performance of normal scalp EEG‐derived features against MRI abnormalities.

    Results

    We found that several spectral power and coherence features showed significant differences correlated with surgical outcomes and that they were most pronounced in the 10–25 Hz range. The Naïve Bayes classification based on those features predicted 1‐year seizure freedom following ATL with area under the curve (AUC) values of 0.78 and 0.76 for the MC and CC data sets, respectively. Subsequent analyses revealed that (a) interhemispheric spectral coherence features in the 10–25 Hz range provided better predictability than other combinations and (b) normal scalp EEG‐derived features provided superior and potentially distinct predictive value when compared with MRI abnormalities (>10% higher F1 score).

    Significance

    These results support that quantitative analysis of even a normal presurgical scalp EEG may help prognosticate seizure freedom following ATL in patients with drug‐resistant TLE. Although the mechanism for this result is not known, the scalp EEG spectral and coherence properties predicting seizure freedom may represent activity arising from the neocortex or the networks responsible for temporal lobe seizure generation within vs outside the margins of an ATL.

     
    more » « less
  4. Abstract STUDY QUESTION

    Can we derive adequate models to predict the probability of conception among couples actively trying to conceive?

    SUMMARY ANSWER

    Leveraging data collected from female participants in a North American preconception cohort study, we developed models to predict pregnancy with performance of ∼70% in the area under the receiver operating characteristic curve (AUC).

    WHAT IS KNOWN ALREADY

    Earlier work has focused primarily on identifying individual risk factors for infertility. Several predictive models have been developed in subfertile populations, with relatively low discrimination (AUC: 59–64%).

    STUDY DESIGN, SIZE, DURATION

    Study participants were female, aged 21–45 years, residents of the USA or Canada, not using fertility treatment, and actively trying to conceive at enrollment (2013–2019). Participants completed a baseline questionnaire at enrollment and follow-up questionnaires every 2 months for up to 12 months or until conception. We used data from 4133 participants with no more than one menstrual cycle of pregnancy attempt at study entry.

    PARTICIPANTS/MATERIALS, SETTING, METHODS

    On the baseline questionnaire, participants reported data on sociodemographic factors, lifestyle and behavioral factors, diet quality, medical history and selected male partner characteristics. A total of 163 predictors were considered in this study. We implemented regularized logistic regression, support vector machines, neural networks and gradient boosted decision trees to derive models predicting the probability of pregnancy: (i) within fewer than 12 menstrual cycles of pregnancy attempt time (Model I), and (ii) within 6 menstrual cycles of pregnancy attempt time (Model II). Cox models were used to predict the probability of pregnancy within each menstrual cycle for up to 12 cycles of follow-up (Model III). We assessed model performance using the AUC and the weighted-F1 score for Models I and II, and the concordance index for Model III.

    MAIN RESULTS AND THE ROLE OF CHANCE

    Model I and II AUCs were 70% and 66%, respectively, in parsimonious models, and the concordance index for Model III was 63%. The predictors that were positively associated with pregnancy in all models were: having previously breastfed an infant and using multivitamins or folic acid supplements. The predictors that were inversely associated with pregnancy in all models were: female age, female BMI and history of infertility. Among nulligravid women with no history of infertility, the most important predictors were: female age, female BMI, male BMI, use of a fertility app, attempt time at study entry and perceived stress.

    LIMITATIONS, REASONS FOR CAUTION

    Reliance on self-reported predictor data could have introduced misclassification, which would likely be non-differential with respect to the pregnancy outcome given the prospective design. In addition, we cannot be certain that all relevant predictor variables were considered. Finally, though we validated the models using split-sample replication techniques, we did not conduct an external validation study.

    WIDER IMPLICATIONS OF THE FINDINGS

    Given a wide range of predictor data, machine learning algorithms can be leveraged to analyze epidemiologic data and predict the probability of conception with discrimination that exceeds earlier work.

    STUDY FUNDING/COMPETING INTEREST(S)

    The research was partially supported by the U.S. National Science Foundation (under grants DMS-1664644, CNS-1645681 and IIS-1914792) and the National Institutes for Health (under grants R01 GM135930 and UL54 TR004130). In the last 3 years, L.A.W. has received in-kind donations for primary data collection in PRESTO from FertilityFriend.com, Kindara.com, Sandstone Diagnostics and Swiss Precision Diagnostics. L.A.W. also serves as a fibroid consultant to AbbVie, Inc. The other authors declare no competing interests.

    TRIAL REGISTRATION NUMBER

    N/A.

     
    more » « less
  5. Abstract

    A hybrid two-stage machine-learning architecture that addresses the problem of excessive false positives (false alarms) in solar flare prediction systems is investigated. The first stage is a convolutional neural network (CNN) model based on the VGG-16 architecture that extracts features from a temporal stack of consecutive Solar Dynamics Observatory Helioseismic and Magnetic Imager magnetogram images to produce a flaring probability. The probability of flaring is added to a feature vector derived from the magnetograms to train an extremely randomized trees (ERT) model in the second stage to produce a binary deterministic prediction (flare/no-flare) in a 12 hr forecast window. To tune the hyperparameters of the architecture, a new evaluation metric is introduced: the “scaled True Skill Statistic.” It specifically addresses the large discrepancy between the true positive rate and the false positive rate in the highly unbalanced solar flare event training data sets. Through hyperparameter tuning to maximize this new metric, our two-stage architecture drastically reduces false positives by ≈48% without significantly affecting the true positives (reduction by ≈12%), when compared with predictions from the first-stage CNN alone. This, in turn, improves various traditional binary classification metrics sensitive to false positives, such as the precision, F1, and the Heidke Skill Score. The end result is a more robust 12 hr flare prediction system that could be combined with current operational flare-forecasting methods. Additionally, using the ERT-based feature-ranking mechanism, we show that the CNN output probability is highly ranked in terms of flare prediction relevance.

     
    more » « less