- PAR ID:
- 10352703
- Date Published:
- Journal Name:
- Environmental Data Science
- Volume:
- 1
- ISSN:
- 2634-4602
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Convolutional neural networks (CNNs) have recently attracted great attention in geoscience due to their ability to capture non-linear system behavior and extract predictive spatiotemporal patterns. Given their black-box nature however, and the importance of prediction explainability, methods of explainable artificial intelligence (XAI) are gaining popularity as a means to explain the CNN decision-making strategy. Here, we establish an intercomparison of some of the most popular XAI methods and investigate their fidelity in explaining CNN decisions for geoscientific applications. Our goal is to raise awareness of the theoretical limitations of these methods and gain insight into the relative strengths and weaknesses to help guide best practices. The considered XAI methods are first applied to an idealized attribution benchmark, where the ground truth of explanation of the network is known a priori , to help objectively assess their performance. Secondly, we apply XAI to a climate-related prediction setting, namely to explain a CNN that is trained to predict the number of atmospheric rivers in daily snapshots of climate simulations. Our results highlight several important issues of XAI methods (e.g., gradient shattering, inability to distinguish the sign of attribution, ignorance to zero input) that have previously been overlooked in our field and, if not considered cautiously, may lead to a distorted picture of the CNN decision-making strategy. We envision that our analysis will motivate further investigation into XAI fidelity and will help towards a cautious implementation of XAI in geoscience, which can lead to further exploitation of CNNs and deep learning for prediction problems.more » « less
-
Unexplainable black-box models create scenarios where anomalies cause deleterious responses, thus creating unacceptable risks. These risks have motivated the field of eXplainable Artificial Intelligence (XAI) which improves trust by evaluating local interpretability in black-box neural networks. Unfortunately, the ground truth is unavailable for the model's decision, so evaluation is limited to qualitative assessment. Further, interpretability may lead to inaccurate conclusions about the model or a false sense of trust. We propose to improve XAI from the vantage point of the user's trust by exploring a black-box model's latent feature space. We present an approach, ProtoShotXAI, that uses a Prototypical few-shot network to explore the contrastive manifold between nonlinear features of different classes. A user explores the manifold by perturbing the input features of a query sample and recording the response for a subset of exemplars from any class. Our approach is a locally interpretable XAI model that can be extended to, and demonstrated on, few-shot networks. We compare ProtoShotXAI to the state-of-the-art XAI approaches on MNIST, Omniglot, and ImageNet to demonstrate, both quantitatively and qualitatively, that ProtoShotXAI provides more flexibility for model exploration. Finally, ProtoShotXAI also demonstrates novel explainability and detectability on adversarial samples.more » « less
-
IDC: quantitative evaluation benchmark of interpretation methods for deep text classification models
Abstract Recent advances in deep neural networks have achieved outstanding success in natural language processing tasks. Interpretation methods that provide insight into the decision-making process of these models have received an influx of research attention because of the success and the black-box nature of the deep text classification models. Evaluation of these methods has been based on changes in classification accuracy or prediction confidence when removing important words identified by these methods. There are no measurements of the actual difference between the predicted important words and humans’ interpretation of ground truth because of the lack of interpretation ground truth. A large publicly available interpretation ground truth has the potential to advance the development of interpretation methods. Manual labeling important words for each document to create a large interpretation ground truth is very time-consuming. This paper presents (1) IDC, a new benchmark for quantitative evaluation of interpretation methods for deep text classification models, and (2) evaluation of six interpretation methods using the benchmark. The IDC benchmark consists of: (1) Three methods that generate three pseudo-interpretation ground truth datasets. (2) Three performance metrics: interpretation recall, interpretation precision, and Cohen’s kappa inter-agreement. Findings: IDC-generated interpretation ground truth agrees with human annotators on sampled movie reviews. IDC identifies Layer-wise Relevance Propagation and the gradient-by-input methods as the winning interpretation methods in this study.
-
The massive surge in the amount of observational field data demands richer and more meaningful collab-oration between data scientists and geoscientists. This document was written by members of the Working Group on Case Studies of the NSF-funded RCN on Intelli-gent Systems Research To Support Geosciences (IS-GEO, https:// is-geo.org/ ) to describe our vision to build and enhance such collaboration through the use of specially-designed benchmark datasets. Benchmark datasets serve as summary descriptions of problem areas, providing a simple interface between disciplines without requiring extensive background knowledge. Benchmark data intend to address a number of overarching goals. First, they are concrete, identifiable, and public, which results in a natural coordination of research efforts across multiple disciplines and institutions. Second, they provide multi-fold opportunities for objective comparison of various algorithms in terms of computational costs, accuracy, utility and other measurable standards, to address a particular question in geoscience. Third, as materials for education, the benchmark data cultivate future human capital and interest in geoscience problems and data science methods. Finally, a concerted effort to produce and publish benchmarks has the potential to spur the development of new data science methods, while provid-ing deeper insights into many fundamental problems in modern geosciences. That is, similarly to the critical role the genomic and molecular biology data archives serve in facilitating the field of bioinformatics, we expect that the proposed geosciences data repository will serve as “catalysts” for the new discicpline of geoinformatics. We describe specifications of a high quality geoscience bench-mark dataset and discuss some of our first benchmark efforts. We invite the Climate Informatics community to join us in creating additional benchmarks that aim to address important climate science problems.more » « less
-
The massive surge in the amount of observational field data demands richer and more meaningful collab- oration between data scientists and geoscientists. This document was written by members of the Working Group on Case Studies of the NSF-funded RCN on Intelli- gent Systems Research To Support Geosciences (IS-GEO, https://is-geo.org/) to describe our vision to build and enhance such collaboration through the use of specially- designed benchmark datasets. Benchmark datasets serve as summary descriptions of problem areas, providing a simple interface between disciplines without requiring extensive background knowledge. Benchmark data intend to address a number of overarching goals. First, they are concrete, identifiable, and public, which results in a natural coordination of research efforts across multiple disciplines and institutions. Second, they provide multi- fold opportunities for objective comparison of various algorithms in terms of computational costs, accuracy, utility and other measurable standards, to address a particular question in geoscience. Third, as materials for education, the benchmark data cultivate future human capital and interest in geoscience problems and data science methods. Finally, a concerted effort to produce and publish benchmarks has the potential to spur the development of new data science methods, while provid- ing deeper insights into many fundamental problems in modern geosciences. That is, similarly to the critical role the genomic and molecular biology data archives serve in facilitating the field of bioinformatics, we expect that the proposed geosciences data repository will serve as “catalysts” for the new discicpline of geoinformatics. We describe specifications of a high quality geoscience bench- mark dataset and discuss some of our first benchmark efforts. We invite the Climate Informatics community to join us in creating additional benchmarks that aim to address important climate science problems.more » « less