Regularizing Black-box Models for Improved Interpretability

Gregory Plumb, Maruan Al-Shedivat

Citation Details

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task. more »

Award ID(s):: 1705121

PAR ID:: 10377581

Author(s) / Creator(s):: Gregory Plumb, Maruan Al-Shedivat

Date Published:: 2020-12-01

Journal Name:: NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems

Volume:: 883

Page Range / eLocation ID:: 10526-10536

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this