The abundant post-earthquake data from the Canterbury, New Zealand (NZ) area is poised for use with machine learning (ML) to further advance our ability to better predict and understand the effects of liquefaction. Liquefaction manifestation is one of the identifiable effects of liquefaction, a nonlinear phenomenon that is still not well understood. ML algorithms are often termed as “black-box” models that have little to no explainability for the resultant predictions, making them difficult for use in practice. With the SHapley Additive exPlanations (SHAP) algorithm wrapper, mathematically backed explanations can be fit to the model to track input feature influences on the final prediction. In this paper, Random Forest (RF) is chosen as the ML model to be utilized as it is a powerful non-parametric classification model, then SHAP is applied to calculate explanations for the predictions at a global and local feature scale. The RF model hyperparameters are optimized with a two-step grid search and a five-fold cross-validation to avoid overfitting. The overall model accuracy is 71% over six ordinal categories predicting the Canterbury Earthquake Sequence measurements from 2010, 2011, and 2016. Insights from the SHAP application onto the RF model include the influences of PGA, GWT depths, and SBTs for each ordinal class prediction. This preliminary exploration using SHAP can pave the way for both reinforcing the performance of current ML models by comparing to previous knowledge and using it as a discovery tool for identifying which research areas are pertinent to unlocking more understanding of liquefaction mechanics. 
                        more » 
                        « less   
                    
                            
                            A Machine Learning Explainability Tutorial for Atmospheric Sciences
                        
                    
    
            Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for subfreezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10485041
- Publisher / Repository:
- American Meteorological Society
- Date Published:
- Journal Name:
- Artificial Intelligence for the Earth Systems
- Volume:
- 3
- Issue:
- 1
- ISSN:
- 2769-7525
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Soil mixing is a ground improvement method that consists of mixing cementitious binders with soil in-situ to create soilcrete. A key parameter in the design and construction of this method is the Unconfined Compressive Strength (UCS) of the soilcrete after a given curing time. This paper explores the intersection of Machine Learning (ML) with geotechnical engineering and soilcrete applications. A database of soilcrete UCS and site/soil/means/methods metadata is compiled from recent projects in the western United States and leveraged to explore UCS prediction with the eXtreme Gradient Boosting (XGBoost) ML algorithm which resulted in a ML model with a R2 value of 88%. To achieve insights from the ML model, the Explainable ML model SHapley Additive exPlanations (SHAP) was then applied to the XGBoost model to explain variable importances and influences for the final UCS prediction value. From this ML application, a blueprint of how to scaffold, feature engineer, and prepare soilcrete data for ML is showcased. Furthermore, the insights obtained from the SHAP model can be further pursued in traditional geotechnical research approaches to expand soil mixing knowledge.more » « less
- 
            The ability to determine whether a robot's grasp has a high chance of failing, before it actually does, can save significant time and avoid failures by planning for re-grasping or changing the strategy for that special case. Machine Learning (ML) offers one way to learn to predict grasp failure from historic data consisting of a robot's attempted grasps alongside labels of the success or failure. Unfortunately, most powerful ML models are black-box models that do not explain the reasons behind their predictions. In this paper, we investigate how ML can be used to predict robot grasp failure and study the tradeoff between accuracy and interpretability by comparing interpretable (white box) ML models that are inherently explainable with more accurate black box ML models that are inherently opaque. Our results show that one does not necessarily have to compromise accuracy for interpretability if we use an explanation generation method, such as Shapley Additive explanations (SHAP), to add explainability to the accurate predictions made by black box models. An explanation of a predicted fault can lead to an efficient choice of corrective action in the robot's design that can be taken to avoid future failures.more » « less
- 
            The complex nature of artificial neural networks raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. The Shapley value---a solution concept from game theory---is one of the most popular explanation methods for machine learning models. More traditionally, from a statistical perspective, feature importance is defined in terms of conditional independence. So far, these two approaches to interpretability and feature importance have been considered separate and distinct. In this work, we show that Shapley-based explanation methods and conditional independence testing are closely related. We introduce the \textbf{SHAP}ley E\textbf{X}planation \textbf{R}andomization \textbf{T}est (SHAP-XRT), a testing procedure inspired by the Conditional Randomization Test (CRT) for a specific notion of local (i.e., on a sample) conditional independence. With it, we prove that for binary classification problems, the marginal contributions in the Shapley value provide lower and upper bounds to the expected p-values of their respective tests. Furthermore, we show that the Shapley value itself provides an upper bound to the expected p-value of a global (i.e., overall) null hypothesis. As a result, we further our understanding of Shapley-based explanation methods from a novel perspective and characterize the conditions under which one can make statistically valid claims about feature importance via the Shapley value.more » « less
- 
            Abstract Characterization of material structure with X-ray or neutron scattering using e.g. Pair Distribution Function (PDF) analysis most often rely on refining a structure model against an experimental dataset. However, identifying a suitable model is often a bottleneck. Recently, automated approaches have made it possible to test thousands of models for each dataset, but these methods are computationally expensive and analysing the output, i.e. extracting structural information from the resulting fits in a meaningful way, is challenging. OurMachineLearning basedMotifExtractor (ML-MotEx) trains an ML algorithm on thousands of fits, and uses SHAP (SHapley Additive exPlanation) values to identify which model features are important for the fit quality. We use the method for 4 different chemical systems, including disordered nanomaterials and clusters. ML-MotEx opens for a type of modelling where each feature in a model is assigned an importance value for the fit quality based on explainable ML.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
