skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience
Abstract Convolutional neural networks (CNNs) have recently attracted great attention in geoscience due to their ability to capture non-linear system behavior and extract predictive spatiotemporal patterns. Given their black-box nature however, and the importance of prediction explainability, methods of explainable artificial intelligence (XAI) are gaining popularity as a means to explain the CNN decision-making strategy. Here, we establish an intercomparison of some of the most popular XAI methods and investigate their fidelity in explaining CNN decisions for geoscientific applications. Our goal is to raise awareness of the theoretical limitations of these methods and gain insight into the relative strengths and weaknesses to help guide best practices. The considered XAI methods are first applied to an idealized attribution benchmark, where the ground truth of explanation of the network is known a priori , to help objectively assess their performance. Secondly, we apply XAI to a climate-related prediction setting, namely to explain a CNN that is trained to predict the number of atmospheric rivers in daily snapshots of climate simulations. Our results highlight several important issues of XAI methods (e.g., gradient shattering, inability to distinguish the sign of attribution, ignorance to zero input) that have previously been overlooked in our field and, if not considered cautiously, may lead to a distorted picture of the CNN decision-making strategy. We envision that our analysis will motivate further investigation into XAI fidelity and will help towards a cautious implementation of XAI in geoscience, which can lead to further exploitation of CNNs and deep learning for prediction problems.  more » « less
Award ID(s):
2019758 1934668
PAR ID:
10352708
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Artificial Intelligence for the Earth Systems
ISSN:
2769-7525
Page Range / eLocation ID:
1 to 42
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Complex machine learning architectures and high-dimensional gridded input data are increasingly used to develop high-performance geoscience models, but model complexity obfuscates their decision-making strategies. Understanding the learned patterns is useful for model improvement or scientific investigation, motivating research in eXplainable artificial intelligence (XAI) methods. XAI methods often struggle to produce meaningful explanations of correlated features. Gridded geospatial data tends to have extensive autocorrelation so it is difficult to obtain meaningful explanations of geoscience models. A recommendation is to group correlated features and explain those groups. This is becoming common when using XAI to explain tabular data. Here, we demonstrate that XAI algorithms are highly sensitive to the choice of how we group raster elements. We demonstrate that reliance on a single partition scheme yields misleading explanations. We propose comparing explanations from multiple grouping schemes to extract more accurate insights from XAI. We argue that each grouping scheme probes the model in a different way so that each asks a different question of the model. By analyzing where the explanations agree and disagree, we can learn information about the scale of the learned features. FogNet, a complex three-dimensional convolutional neural network for coastal fog prediction, is used as a case study for investigating the influence of feature grouping schemes on XAI. Our results demonstrate that careful consideration of how each grouping scheme probes the model is key to extracting insights and avoiding misleading interpretations. 
    more » « less
  2. Abstract Methods of explainable artificial intelligence (XAI) are used in geoscientific applications to gain insights into the decision-making strategy of neural networks (NNs), highlighting which features in the input contribute the most to a NN prediction. Here, we discuss our “lesson learned” that the task of attributing a prediction to the input does not have a single solution. Instead, the attribution results depend greatly on the considered baseline that the XAI method utilizes—a fact that has been overlooked in the geoscientific literature. The baseline is a reference point to which the prediction is compared so that the prediction can be understood. This baseline can be chosen by the user or is set by construction in the method’s algorithm—often without the user being aware of that choice. We highlight that different baselines can lead to different insights for different science questions and, thus, should be chosen accordingly. To illustrate the impact of the baseline, we use a large ensemble of historical and future climate simulations forced with the shared socioeconomic pathway 3-7.0 (SSP3-7.0) scenario and train a fully connected NN to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We then use various XAI methods and different baselines to attribute the network predictions to the input. We show that attributions differ substantially when considering different baselines, because they correspond to answering different science questions. We conclude by discussing important implications and considerations about the use of baselines in XAI research. Significance StatementIn recent years, methods of explainable artificial intelligence (XAI) have found great application in geoscientific applications, because they can be used to attribute the predictions of neural networks (NNs) to the input and interpret them physically. Here, we highlight that the attributions—and the physical interpretation—depend greatly on the choice of the baseline—a fact that has been overlooked in the geoscientific literature. We illustrate this dependence for a specific climate task, in which a NN is trained to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We show that attributions differ substantially when considering different baselines, because they correspond to answering different science questions. 
    more » « less
  3. Abstract Despite the increasingly successful application of neural networks to many problems in the geosciences, their complex and nonlinear structure makes the interpretation of their predictions difficult, which limits model trust and does not allow scientists to gain physical insights about the problem at hand. Many different methods have been introduced in the emerging field of eXplainable Artificial Intelligence (XAI), which aims at attributing the network’s prediction to specific features in the input domain. XAI methods are usually assessed by using benchmark datasets (such as MNIST or ImageNet for image classification). However, an objective, theoretically derived ground truth for the attribution is lacking for most of these datasets, making the assessment of XAI in many cases subjective. Also, benchmark datasets specifically designed for problems in geosciences are rare. Here, we provide a framework, based on the use of additively separable functions, to generate attribution benchmark datasets for regression problems for which the ground truth of the attribution is known a priori. We generate a large benchmark dataset and train a fully connected network to learn the underlying function that was used for simulation. We then compare estimated heatmaps from different XAI methods to the ground truth in order to identify examples where specific XAI methods perform well or poorly. We believe that attribution benchmarks as the ones introduced herein are of great importance for further application of neural networks in the geosciences, and for more objective assessment and accurate implementation of XAI methods, which will increase model trust and assist in discovering new science. 
    more » « less
  4. Pham, Tien; Solomon, Latasha; Hohil, Myron E. (Ed.)
    Explainable Artificial Intelligence (XAI) is the capability of explaining the reasoning behind the choices made by the machine learning (ML) algorithm which can help understand and maintain the transparency of the decision-making capability of the ML algorithm. Humans make thousands of decisions every day in their lives. Every decision an individual makes, they can explain the reasons behind why they made the choices that they made. Nonetheless, it is not the same in the case of ML and AI systems. Furthermore, XAI was not wideley researched until suddenly the topic was brought forward and has been one of the most relevant topics in AI for trustworthy and transparent outcomes. XAI tries to provide maximum transparency to a ML algorithm by answering questions about how models effectively came up with the output. ML models with XAI will have the ability to explain the rationale behind the results, understand the weaknesses and strengths the learning models, and be able to see how the models will behave in the future. In this paper, we investigate XAI for algorithmic trustworthiness and transparency. We evaluate XAI using some example use cases and by using SHAP (SHapley Additive exPlanations) library and visualizing the effect of features individually and cumulatively in the prediction process. 
    more » « less
  5. The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search unreasonable local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and reasonable model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable. 
    more » « less