FOLD-R is an automated inductive learning algorithm for learning default rules for mixed (numerical and categorical) data. It generates an (explainable) normal logic program (NLP) rule set for classification tasks. We present an improved FOLD-R algorithm, called FOLD-R++, that significantly increases the efficiency and scalability of FOLD-R by orders of magnitude. FOLD-R++ improves upon FOLD-R without compromising or losing information in the input training data during the encoding or feature selection phase. The FOLD-R++ algorithm is competitive in performance with the widely-used XGBoost algorithm, however, unlike XGBoost, the FOLD-R++ algorithm produces an explainable model. FOLD-R++ is also competitive in performance with the RIPPER system, however, on large datasets FOLD-R++ outperforms RIPPER. We also create a powerful tool-set by combining FOLD-R++ with s(CASP)—a goal-directed answer set programming (ASP) execution engine—to make predictions on new data samples using the normal logic program generated by FOLD-R++. The s(CASP) system also produces a justification for the prediction. Experiments presented in this paper show that our improved FOLD-R++ algorithm is a significant improvement over the original design and that the s(CASP) system can make predictions in an efficient manner as well.
more »
« less
FOLD-RM: A Scalable, Efficient, and Explainable Inductive Learning Algorithm for Multi-Category Classification of Mixed Data
Abstract FOLD-RM is an automated inductive learning algorithm for learning default rules for mixed (numerical and categorical) data. It generates an (explainable) answer set programming (ASP) rule set for multi-category classification tasks while maintaining efficiency and scalability. The FOLD-RM algorithm is competitive in performance with the widely used, state-of-the-art algorithms such as XGBoost and multi-layer perceptrons, however, unlike these algorithms, the FOLD-RM algorithm produces an explainable model. FOLD-RM outperforms XGBoost on some datasets, particularly large ones. FOLD-RM also provides human-friendly explanations for predictions.
more »
« less
- Award ID(s):
- 1910131
- PAR ID:
- 10376659
- Date Published:
- Journal Name:
- Theory and Practice of Logic Programming
- Volume:
- 22
- Issue:
- 5
- ISSN:
- 1471-0684
- Page Range / eLocation ID:
- 658 to 677
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper we present a web-based prototype for an explainable ranking algorithm in multi-layered networks, incorporating both network topology and knowledge information. While traditional ranking algorithms such as PageRank and HITS are important tools for exploring the underlying structure of networks, they have two fundamental limitations in their efforts to generate high accuracy rankings. First, they are primarily focused on network topology, leaving out additional sources of information (e.g. attributes, knowledge). Secondly, most algorithms do not provide explanations to the end-users on why the algorithm gives the specific ranking results, hindering the usability of the ranking information. We developed Xrank, an explainable ranking tool, to address these drawbacks. Empirical results indicate that our explainable ranking method not only improves ranking accuracy, but facilitates user understanding of the ranking by exploring the top influential elements in multi-layered networks. The web-based prototype (Xrank: http://www.x-rank.net) is currently online - we believe it will assist both researchers and practitioners looking to explore and exploit multi-layered network data.more » « less
-
Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation and opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user's preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation. Extensive experiments on two large collections of Amazon and Yelp reviews confirmed the effectiveness of our solution in both recommendation and explanation tasks, compared with several existing recommendation algorithms. And our extensive user study clearly demonstrates the practical value of the explainable recommendations generated by our algorithm.more » « less
-
In this paper, we aim to address a relevant estimation problem that aviation professionals encounter in their daily operations. Specifically, aircraft load planners require information on the expected number of checked bags for a flight several hours prior to its scheduled departure to properly palletize and load the aircraft. However, the checked baggage prediction problem has not been sufficiently studied in the literature, particularly at the flight level. Existing prediction approaches have not properly accounted for the different impacts of overestimating and underestimating checked baggage volumes on airline operations. Therefore, we propose a custom loss function, in the form of a piecewise quadratic function, which aligns with airline operations practice and utilizes machine learning algorithms to optimize checked baggage predictions incorporating the new loss function. We consider multiple linear regression, LightGBM, and XGBoost, as supervised learning algorithms. We apply our proposed methods to baggage data from a major airline and additional data from various U.S. government agencies. We compare the performance of the three customized supervised learning algorithms. We find that the two gradient boosting methods (i.e., LightGBM and XGBoost) yield higher accuracy than the multiple linear regression; XGBoost outperforms LightGBM while LightGBM requires much less training time than XGBoost. We also investigate the performance of XGBoost on samples from different categories and provide insights for selecting an appropriate prediction algorithm to improve baggage prediction practices. Our modeling framework can be adapted to address other prediction challenges in aviation, such as predicting the number of standby passengers or no-shows.more » « less
-
BackgroundMetastatic cancer remains one of the leading causes of cancer-related mortality worldwide. Yet, the prediction of survivability in this population remains limited by heterogeneous clinical presentations and high-dimensional molecular features. Advances in machine learning (ML) provide an opportunity to integrate diverse patient- and tumor-level factors into explainable predictive ML models. Leveraging large real-world datasets and modern ML techniques can enable improved risk stratification and precision oncology. ObjectiveThis study aimed to develop and interpret ML models for predicting overall survival in patients with metastatic cancer using the Memorial Sloan Kettering-Metastatic (MSK-MET) dataset and to identify key prognostic biomarkers through explainable artificial intelligence techniques. MethodsWe performed a retrospective analysis of the MSK-MET cohort, comprising 25,775 patients across 27 tumor types. After data cleaning and balancing, 20,338 patients were included. Overall survival was defined as deceased versus living at last follow-up. Five classifiers (extreme gradient boosting [XGBoost], logistic regression, random forest, decision tree, and naive Bayes) were trained using an 80/20 stratified split and optimized via grid search with 5-fold cross-validation. Model performance was assessed using accuracy, area under the curve (AUC), precision, recall, and F1-score. Model explainability was achieved using Shapley additive explanations (SHAP). Survival analyses included Kaplan-Meier estimates, Cox proportional hazards models, and an XGBoost-Cox model for time-to-event prediction. The positive predictive value and negative predictive value were calculated at the Youden index–optimal threshold. ResultsXGBoost achieved the highest performance (accuracy=0.74; AUC=0.82), outperforming other classifiers. In survival analyses, the XGBoost-Cox model with a concordance index (C-index) of 0.70 exceeded the traditional Cox model (C-index=0.66). SHAP analysis and Cox models consistently identified metastatic site count, tumor mutational burden, fraction of genome altered, and the presence of distant liver and bone metastases as among the strongest prognostic factors, a pattern that held at both the pan-cancer level and recurrently across cancer-specific models. At the cancer-specific level, performance varied; prostate cancer achieved the highest predictive accuracy (AUC=0.88), while pancreatic cancer was notably more challenging (AUC=0.68). Kaplan-Meier analyses demonstrated marked survival separation between patients with and without metastases (80-month survival: approximately 0.80 vs 0.30). At the Youden-optimal threshold, positive predictive value and negative predictive value were approximately 70% and 80%, respectively, supporting clinical use for risk stratification. ConclusionsExplainable ML models, particularly XGBoost combined with SHAP, can strongly predict survivability in metastatic cancers while highlighting clinically meaningful features. These findings support the use of ML-based tools for patient counseling, treatment planning, and integration into precision oncology workflows. Future work should include external validation on independent cohorts, integration with electronic health records via Fast Healthcare Interoperability Resources–based dashboards, and prospective clinician-in-the-loop evaluation to assess real-world use.more » « less
An official website of the United States government

