Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity

Carton, Samuel; Mei, Qiaozhu; Resnick, Paul

Citation Details

We present an experimental assessment of the impact of feature attribution-style explanations on human performance in predicting the consensus toxicity of social media posts with advice from an unreliable machine learning model. By doing so we add to a small but growing body of literature inspecting the utility of interpretable machine learning in terms of human outcomes. We also evaluate interpretable machine learning for the first time in the important domain of online toxicity, where fully-automated methods have faced criticism as being inadequate as a measure of toxic behavior. We find that, contrary to expectations, explanations have no significant impact on accuracy or agreement with model predictions, through they do change the distribution of subject error somewhat while reducing the cognitive burden of the task for subjects. Our results contribute to the recognition of an intriguing expectation gap in the field of interpretable machine learning between the general excitement the field has engendered and the ambiguous results of recent experimental work, including this study. more »

Award ID(s):: 1717688

PAR ID:: 10211065

Author(s) / Creator(s):: Carton, Samuel; Mei, Qiaozhu; Resnick, Paul

Date Published:: 2020-05-26

Journal Name:: Proceedings of the Fourteenth International AAAI Conference on Web and Social Media

Volume:: 14

Issue:: 1

Page Range / eLocation ID:: 95-106

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this