Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for subfreezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.
more »
« less
PyGRF: An Improved Python Geographical Random Forest Model and Case Studies in Public Health and Natural Disasters
ABSTRACT Geographical random forest (GRF) is a recently developed and spatially explicit machine learning model. With the ability to provide more accurate predictions and local interpretations, GRF has already been used in many studies. The current GRF model, however, has limitations in its determination of the local model weight and bandwidth hyperparameters, potentially insufficient numbers of local training samples, and sometimes high local prediction errors. Also, implemented as an R package, GRF currently does not have a Python version which limits its adoption among machine learning practitioners who prefer Python. This work addresses these limitations by introducing theory‐informed hyperparameter determination, local training sample expansion, and spatially weighted local prediction. We also develop a Python‐based GRF model and package, PyGRF, to facilitate the use of the model. We evaluate the performance of PyGRF on an example dataset and further demonstrate its use in two case studies in public health and natural disasters.
more »
« less
- PAR ID:
- 10542493
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Transactions in GIS
- ISSN:
- 1361-1682
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Vanschoren, J (Ed.)As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have grown in recent years. However, no comprehensive package exists that enables non-specialists to use these methods easily. mvlearn is a Python library which implements the leading multiview machine learning methods. Its simple API closely follows that of scikit-learn for increased ease-of-use. The package can be installed from Python Package Index (PyPI) and the conda package manager and is released under the MIT open-source license. The documentation, detailed examples, and all releases are available at https://mvlearn.github.io/.more » « less
-
EPViz: A flexible and lightweight visualizer to facilitate predictive modeling for multi-channel EEGM, Murugappan (Ed.)Scalp Electroencephalography (EEG) is one of the most popular noninvasive modalities for studying real-time neural phenomena. While traditional EEG studies have focused on identifying group-level statistical effects, the rise of machine learning has prompted a shift in computational neuroscience towards spatio-temporal predictive analyses. We introduce a novel open-source viewer, the EEG Prediction Visualizer (EPViz), to aid researchers in developing, validating, and reporting their predictive modeling outputs. EPViz is a lightweight and standalone software package developed in Python. Beyond viewing and manipulating the EEG data, EPViz allows researchers to load a PyTorch deep learning model, apply it to EEG features, and overlay the output channel-wise or subject-level temporal predictions on top of the original time series. These results can be saved as high-resolution images for use in manuscripts and presentations. EPViz also provides valuable tools for clinician-scientists, including spectrum visualization, computation of basic data statistics, and annotation editing. Finally, we have included a built-in EDF anonymization module to facilitate sharing of clinical data. Taken together, EPViz fills a much needed gap in EEG visualization. Our user-friendly interface and rich collection of features may also help to promote collaboration between engineers and clinicians.more » « less
-
Gu, Yaodong (Ed.)Traditional gait event detection methods for heel strike and toe-off utilize thresholding with ground reaction force (GRF) or kinematic data, while recent methods tend to use neural networks. However, when subjects’ walking behaviors are significantly altered by an assistive walking device, these detection methods tend to fail. Therefore, this paper introduces a new long short-term memory (LSTM)-based model for detecting gait events in subjects walking with a pair of custom ankle exoskeletons. This new model was developed by multiplying the weighted output of two LSTM models, one with GRF data as the input and one with heel marker height as input. The gait events were found using peak detection on the final model output. Compared to other machine learning algorithms, which use roughly 8:1 training-to-testing data ratio, this new model required only a 1:79 training-to-testing data ratio. The algorithm successfully detected over 98% of events within 16ms of manually identified events, which is greater than the 65% to 98% detection rate of previous LSTM algorithms. The high robustness and low training requirements of the model makes it an excellent tool for automated gait event detection for both exoskeleton-assisted and unassisted walking of healthy human subjects.more » « less
-
Abstract Remote sensing of forested landscapes can transform the speed, scale and cost of forest research. The delineation of individual trees in remote sensing images is an essential task in forest analysis. Here we introduce a newPythonpackage, DeepForest that detects individual trees in high resolution RGB imagery using deep learning.While deep learning has proven highly effective in a range of computer vision tasks, it requires large amounts of training data that are typically difficult to obtain in ecological studies. DeepForest overcomes this limitation by including a model pretrained on over 30 million algorithmically generated crowns from 22 forests and fine‐tuned using 10,000 hand‐labelled crowns from six forests.The package supports the application of this general model to new data, fine tuning the model to new datasets with user labelled crowns, training new models and evaluating model predictions. This simplifies the process of using and retraining deep learning models for a range of forests, sensors and spatial resolutions.We illustrate the workflow of DeepForest using data from the National Ecological Observatory Network, a tropical forest in French Guiana, and street trees from Portland, Oregon.more » « less
An official website of the United States government
