Predictive Modeling of an Unbalanced Binary Outcome in Food Insecurity Data

Fabish, J; Davis, L.; Kim, S.

Citation Details

Predictive modeling of a rare event using an unbalanced data set leads to poor prediction sensitivity. Although this obstacle is often accompanied by other analytical issues such as a large number of predictors and multicollinearity, little has been done to address these issues simultaneously. The objective of this study is to compare several predictive modeling techniques in this setting. The unbalanced data set is addressed using four resampling methods: undersampling, oversampling, hybrid sampling, and ROSE synthetic data generation. The large number of predictors is addressed using penalized regression methods and ensemble methods. The predictive models are evaluated in terms of sensitivity and F1 score via simulation studies and applied to the prediction of food deserts in North Carolina. Our results show that balancing the data via resampling methods leads to an improved prediction sensitivity for every classifier. The application analysis shows that resampling also leads to an increase in F1 score for every classifier while the simulated data showed that the F1 score tended to decrease slightly in most cases. Our findings may help improve classification performance for unbalanced rare event data in many other applications. more »

Award ID(s):: 1735258

PAR ID:: 10109275

Author(s) / Creator(s):: Fabish, J; Davis, L.; Kim, S.

Publisher / Repository:: Proceedings of the 15th International Conference on Data Science (2019)

Date Published:: 2019-07-01

Journal Name:: Predictive Modeling of an Unbalanced Binary Outcome in Food Insecurity Data

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this