The abundance of post-earthquake data from the Canterbury, New Zealand (NZ), area can be leveraged for exploring machine learning (ML) opportunities for geotechnical earthquake engineering. Herein, random forest (RF) is chosen as the ML model to be utilized as it is a powerful non-parametric classification model that can also calculate global feature importance post-model building. The results and procedure are presented of building a multiclass liquefaction manifestation classification RF model with features engineered to preserve special relationships. The RF model hyperparameters are optimized with a two-step fivefold crossvalidation grid search to avoid overfitting. The overall model accuracy is 96% over six ordinal categories predicting over the Canterbury earthquake sequence measurements from 2010, 2011, and 2016. The resultant RF model can serve as a blueprint for incorporation of other sources of physical data such as geological maps to widen the bounds of model usability.
more »
« less
Explainable Machine Learning Interpretations on New Zealand Random Forest Liquefaction Manifestation Predictions
The abundant post-earthquake data from the Canterbury, New Zealand (NZ) area is poised for use with machine learning (ML) to further advance our ability to better predict and understand the effects of liquefaction. Liquefaction manifestation is one of the identifiable effects of liquefaction, a nonlinear phenomenon that is still not well understood. ML algorithms are often termed as “black-box” models that have little to no explainability for the resultant predictions, making them difficult for use in practice. With the SHapley Additive exPlanations (SHAP) algorithm wrapper, mathematically backed explanations can be fit to the model to track input feature influences on the final prediction. In this paper, Random Forest (RF) is chosen as the ML model to be utilized as it is a powerful non-parametric classification model, then SHAP is applied to calculate explanations for the predictions at a global and local feature scale. The RF model hyperparameters are optimized with a two-step grid search and a five-fold cross-validation to avoid overfitting. The overall model accuracy is 71% over six ordinal categories predicting the Canterbury Earthquake Sequence measurements from 2010, 2011, and 2016. Insights from the SHAP application onto the RF model include the influences of PGA, GWT depths, and SBTs for each ordinal class prediction. This preliminary exploration using SHAP can pave the way for both reinforcing the performance of current ML models by comparing to previous knowledge and using it as a discovery tool for identifying which research areas are pertinent to unlocking more understanding of liquefaction mechanics.
more »
« less
- Award ID(s):
- 2047838
- PAR ID:
- 10553631
- Publisher / Repository:
- Japanese Geotechnical Society
- Date Published:
- Journal Name:
- Japanese Geotechnical Society Special Publication
- Volume:
- 10
- Issue:
- 37
- ISSN:
- 2188-8027
- Page Range / eLocation ID:
- 1424 to 1429
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for subfreezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.more » « less
-
Soil mixing is a ground improvement method that consists of mixing cementitious binders with soil in-situ to create soilcrete. A key parameter in the design and construction of this method is the Unconfined Compressive Strength (UCS) of the soilcrete after a given curing time. This paper explores the intersection of Machine Learning (ML) with geotechnical engineering and soilcrete applications. A database of soilcrete UCS and site/soil/means/methods metadata is compiled from recent projects in the western United States and leveraged to explore UCS prediction with the eXtreme Gradient Boosting (XGBoost) ML algorithm which resulted in a ML model with a R2 value of 88%. To achieve insights from the ML model, the Explainable ML model SHapley Additive exPlanations (SHAP) was then applied to the XGBoost model to explain variable importances and influences for the final UCS prediction value. From this ML application, a blueprint of how to scaffold, feature engineer, and prepare soilcrete data for ML is showcased. Furthermore, the insights obtained from the SHAP model can be further pursued in traditional geotechnical research approaches to expand soil mixing knowledge.more » « less
-
SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.more » « less
-
null (Ed.)SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.more » « less