Interpretable Failure Detection with Human-Level Concepts

Nguyen, Kien X; Li, Tang; Peng, Xi

doi:10.1609/aaai.v39i25.34831

Citation Details

This content will become publicly available on April 11, 2026

Interpretable Failure Detection with Human-Level Concepts

Reliable failure detection holds paramount importance in safety-critical applications.Yet, neural networks are known to produce overconfident predictions for misclassified samples. As a result, it remains a problematic matter as existing confidence score functions rely on category-level signals, the logits, to detect failures. This research introduces an innovative strategy, leveraging human-level concepts for a dual purpose: to reliably detect when a model fails and to transparently interpret why.By integrating a nuanced array of signals for each category, our method enables a finer-grained assessment of the model's confidence.We present a simple yet highly effective approach based on the ordinal ranking of concept activation to the input image. Without bells and whistles, our method is able to significantly reduce the false positive rate across diverse real-world image classification benchmarks, specifically by 3.7% on ImageNet and 9.0% on EuroSAT. more »

Award ID(s):: 2340074

PAR ID:: 10617646

Author(s) / Creator(s):: Nguyen, Kien X; Li, Tang; Peng, Xi

Publisher / Repository:: Proceedings of the AAAI Conference on Artificial Intelligence

Date Published:: 2025-04-11

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 39

Issue:: 25

ISSN:: 2159-5399

Page Range / eLocation ID:: 26326 to 26334

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 11, 2026
Journal Article:
https://doi.org/10.1609/aaai.v39i25.34831

More Like this