Concept Gradient: Concept-based Interpretation Without Linear Assumption

Bai, Andrew; Ravikumar, Pradeep; Yeh Chih-Kuan; Lin, Neil; Hsieh, Cho-Jui

Citation Details

Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model’s prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets. more »

Award ID(s):: 2211907

PAR ID:: 10450481

Author(s) / Creator(s):: Bai, Andrew; Ravikumar, Pradeep; Yeh Chih-Kuan; Lin, Neil; Hsieh, Cho-Jui

Date Published:: 2023-05-01

Journal Name:: International Conference on Learning Representations (ICLR)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this