Predictive modeling often ignores interaction effects among predictors in high-dimensional data because of analytical and computational challenges. Research in interaction selection has been galvanized along with methodological and computational advances. In this study, we aim to investigate the performance of two types of predictive algorithms that can perform interaction selection. Specifically, we compare the predictive performance and interaction selection accuracy of both penalty-based and tree-based predictive algorithms. Penalty-based algorithms included in our comparative study are the regularization path algorithm under the marginality principle (RAMP), the least absolute shrinkage selector operator (LASSO), the smoothed clipped absolute deviance (SCAD), and the minimax concave penalty (MCP). The tree-based algorithms considered are random forest (RF) and iterative random forest (iRF). We evaluate the effectiveness of these algorithms under various regression and classification models with varying structures and dimensions. We assess predictive performance using the mean squared error for regression and accuracy, sensitivity, specificity, balanced accuracy, and F1 score for classification. We use interaction coverage to judge the algorithm’s efficacy for interaction selection. Our findings reveal that the effectiveness of the selected algorithms varies depending on the number of predictors (data dimension) and the structure of the data-generating model, i.e., linear or nonlinear, hierarchical or non-hierarchical. There were at least one or more scenarios that favored each of the algorithms included in this study. However, from the general pattern, we are able to recommend one or more specific algorithm(s) for some specific scenarios. Our analysis helps clarify each algorithm’s strengths and limitations, offering guidance to researchers and data analysts in choosing an appropriate algorithm for their predictive modeling task based on their data structure.
- Home
- Search Results
- Page 1 of 1
Search for: All records
Total Resources3
- Resource Type
- More
- Availability
- Author / Contributor
- Filter by Author / Creator
Kim, Seongtae (3)
Black, Derrick (1)
Davis, Lauren (1)
Lehavi, Adam (1)
Liu, Liping (1)
Mostafa, Sayed A (1)
Nzekwe, Chinedu J (1)
#Tyler Phillips, Kenneth E. (0)
#Willis, Ciara (0)
& Abreu-Ramos, E. D. (0)
& Abramson, C. I. (0)
& Abreu-Ramos, E. D. (0)
& Adams, S.G. (0)
& Ahmed, K. (0)
& Ahmed, Khadija. (0)
& Aina, D.K. Jr. (0)
& Akcil-Okan, O. (0)
& Akuom, D. (0)
& Aleven, V. (0)
& Andrews-Larson, C. (0)
- Filter by Editor
& Spizer, S. M. (0)
& . Spizer, S. (0)
& Ahn, J. (0)
& Bateiha, S. (0)
& Bosch, N. (0)
& Brennan K. (0)
& Brennan, K. (0)
& Chen, B. (0)
& Chen, Bodong (0)
& Drown, S. (0)
& Ferretti, F. (0)
& Higgins, A. (0)
& J. Peters (0)
& Kali, Y. (0)
& Ruiz-Arias, P.M. (0)
& S. Spitzer (0)
& Sahin. I. (0)
& Spitzer, S. (0)
& Spitzer, S.M. (0)
(submitted - in Review for IEEE ICASSP-2024) (0)
Have feedback or suggestions for a way to improve these results?
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Free, publicly-accessible full text available May 1, 2025
Lehavi, Adam ; Kim, Seongtae ( , 21st IEEE ICMLA (International Conference on Machine Learning and Applications))In the realm of cybersecurity, intrusion detection systems (IDS) detect and prevent attacks based on collected computer and network data. In recent research, IDS models have been constructed using machine learning (ML) and deep learning (DL) methods such as Random Forest (RF) and deep neural networks (DNN). Feature selection (FS) can be used to construct faster, more interpretable, and more accurate models. We look at three different FS techniques; RF information gain (RF-IG), correlation feature selection using the Bat Algorithm (CFS-BA), and CFS using the Aquila Optimizer (CFS-AO). Our results show CFS-BA to be the most efficient of the FS methods, building in 55% of the time of the best RF-IG model while achieving 99.99% of its accuracy. This reinforces prior contributions attesting to CFS-BA’s accuracy while building upon the relationship between subset size, CFS score, and RF-IG score in final results.more » « less
Black, Derrick ; Liu, Liping ; Kim, Seongtae ; Davis, Lauren ( , A prediction model for backpack programs)To help solve the problem of child food insecurity, school backpack programs supply schoolchildren with food to take home on weekends and holiday breaks when school cafeterias are unavailable. It is important to assess and identify the true needs of the children in schools in order to avoid any potential negative effects. This study utilizes linear regression analysis on the data from a backpack program and the data from the schools it serves. The study reveals that the percentage of low income is a significant factor. Through various feature selection methods, a prediction model is obtained, which is then employed to create a backpack needs ranking system for schools in the county not currently being serviced by the backpack program.more » « less