skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HUMAN-CENTERED CONCEPT EXPLANATIONS FOR NEURAL NETWORKS
Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image are important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.  more » « less
Award ID(s):
1955532
PAR ID:
10321436
Author(s) / Creator(s):
; ;
Editor(s):
Hitzler, Pascal; Sarker, Md Kamruzzaman
Date Published:
Journal Name:
Frontiers in artificial intelligence and applications
Volume:
342
ISSN:
0922-6389
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abbott, Derek (Ed.)
    Abstract Human vision, thought, and planning involve parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using neural networks, but a generative model formulation has been lacking. Generative models that leverage compositionality, recursion, and part-whole hierarchies are thought to underlie human concept learning and the ability to construct and represent flexible mental concepts. We introduce Recursive Neural Programs (RNPs), a neural generative model that addresses the part-whole hierarchy learning problem by modeling images as hierarchical trees of probabilistic sensory-motor programs. These programs recursively reuse learned sensory-motor primitives to model an image within different spatial reference frames, enabling hierarchical composition of objects from parts and implementing a grammar for images. We show that RNPs can learn part-whole hierarchies for a variety of image datasets, allowing rich compositionality and intuitive parts-based explanations of objects. Our model also suggests a cognitive framework for understanding how human brains can potentially learn and represent concepts in terms of recursively defined primitives and their relations with each other. 
    more » « less
  2. Out-of-distribution (OOD) detection plays a crucial role in ensuring the safe deployment of deep neural network (DNN) classifiers. While a myriad of methods have focused on improving the performance of OOD detectors, a critical gap remains in interpreting their decisions. We help bridge this gap by providing explanations for OOD detectors based on learned high-level concepts. We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector’s decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors. We also show how to identify prominent concepts contributing to the detection results, and provide further reasoning about their decisions. 
    more » « less
  3. Providing model explanations has gained significant popularity recently. In contrast with the traditional feature-level model explanations, concept-based explanations can provide explanations in the form of high-level human concepts. However, existing concept-based explanation methods implicitly follow a two-step procedure that involves human intervention. Specifically, they first need the human to be involved to define (or extract) the high-level concepts, and then manually compute the importance scores of these identified concepts in a post-hoc way. This laborious process requires significant human effort and resource expenditure due to manual work, which hinders their large-scale deployability. In practice, it is challenging to automatically generate the concept-based explanations without human intervention due to the subjectivity of defining the units of concept-based interpretability. In addition, due to its data-driven nature, the interpretability itself is also potentially susceptible to malicious manipulations. Hence, our goal in this paper is to free human from this tedious process, while ensuring that the generated explanations are provably robust to adversarial perturbations. We propose a novel concept-based interpretation method, which can not only automatically provide the prototype-based concept explanations but also provide certified robustness guarantees for the generated prototype-based explanations. We also conduct extensive experiments on real-world datasets to verify the desirable properties of the proposed method. 
    more » « less
  4. Abstract Atmospheric processes involve both space and time. Thus, humans looking at atmospheric imagery can often spot important signals in an animated loop of an image sequence not apparent in an individual (static) image. Utilizing such signals with automated algorithms requires the ability to identify complex spatiotemporal patterns in image sequences. That is a very challenging task due to the endless possibilities of patterns in both space and time. Here, we review different concepts and techniques that are useful to extract spatiotemporal signals from meteorological image sequences to expand the effectiveness of AI algorithms for classification and prediction tasks. We first present two applications that motivate the need for these approaches in meteorology, namely the detection of convection from satellite imagery and solar forecasting. Then we provide an overview of concepts and techniques that are helpful for the interpretation of meteorological image sequences, such as (a) feature engineering methods using (i) meteorological knowledge, (ii) classic image processing, (iii) harmonic analysis, and (iv) topological data analysis; (b) ways to use convolutional neural networks for this purpose with emphasis on discussing different convolution filters (2D/3D/LSTM-convolution); and (c) a brief survey of several other concepts, including the concept of “attention” in neural networks and its utility for the interpretation of image sequences and strategies from self-supervised and transfer learning to reduce the need for large labeled datasets. We hope that presenting an overview of these tools—many of which are not new but underutilized in this context—will accelerate progress in this area. 
    more » « less
  5. End-to-end deep learning models are increasingly applied to safety-critical human activity recognition (HAR) applications, e.g., healthcare monitoring and smart home control, to reduce developer burden and increase the performance and robustness of prediction models. However, integrating HAR models in safety-critical applications requires trust, and recent approaches have aimed to balance the performance of deep learning models with explainable decision-making for complex activity recognition. Prior works have exploited the compositionality of complex HAR (i.e., higher-level activities composed of lower-level activities) to form models with symbolic interfaces, such as concept-bottleneck architectures, that facilitate inherently interpretable models. However, feature engineering for symbolic concepts-as well as the relationship between the concepts-requires precise annotation of lower-level activities by domain experts, usually with fixed time windows, all of which induce a heavy and error-prone workload on the domain expert. In this paper, we introduce X-CHAR, an eXplainable Complex Human Activity Recognition model that doesn't require precise annotation of low-level activities, offers explanations in the form of human-understandable, high-level concepts, while maintaining the robust performance of end-to-end deep learning models for time series data. X-CHAR learns to model complex activity recognition in the form of a sequence of concepts. For each classification, X-CHAR outputs a sequence of concepts and a counterfactual example as the explanation. We show that the sequence information of the concepts can be modeled using Connectionist Temporal Classification (CTC) loss without having accurate start and end times of low-level annotations in the training dataset-significantly reducing developer burden. We evaluate our model on several complex activity datasets and demonstrate that our model offers explanations without compromising the prediction accuracy in comparison to baseline models. Finally, we conducted a mechanical Turk study to show that the explanations provided by our model are more understandable than the explanations from existing methods for complex activity recognition. 
    more » « less