LEXPLAIN: Improving Model Explanations via Lexicon Supervision

Ahia, Orevaoghene; Gonen, Hila; Balachandran, Vidhisha; Tsvetkov, Yulia; Smith, Noah

doi:10.18653/v1/2023.starsem-1.19

Citation Details

LEXPLAIN: Improving Model Explanations via Lexicon Supervision

Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXPLAIN, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the plausibility of model’s explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection. Our analyses show that our method also demotes spurious correlations (i.e., with respect to African American English dialect) on toxicity detection, improving fairness. more »

Award ID(s):: 2203097 2125201

PAR ID:: 10467897

Author(s) / Creator(s):: Ahia, Orevaoghene; Gonen, Hila; Balachandran, Vidhisha; Tsvetkov, Yulia; Smith, Noah

Publisher / Repository:: Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Date Published:: 2023-07-13

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2023.starsem-1.19

More Like this