Errudite: Scalable, Reproducible, and Testable Error Analysis

Wu, Tongshuang; Ribeiro, Marco Tulio; Heer, Jeffrey; Weld, Daniel

doi:10.18653/v1/P19-1073

Citation Details

Errudite: Scalable, Reproducible, and Testable Error Analysis

Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs. more »

Award ID(s):: 1901386

PAR ID:: 10172006

Author(s) / Creator(s):: Wu, Tongshuang; Ribeiro, Marco Tulio; Heer, Jeffrey; Weld, Daniel

Date Published:: 2019-01-01

Journal Name:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Page Range / eLocation ID:: 747 to 763

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/P19-1073

More Like this