Evaluating BERT for natural language inference: A case study on the CommitmentBank

Jiang, Nanjiang; de Marneffe, Marie-Catherine

doi:10.18653/v1/D19-1630

Citation Details

Evaluating BERT for natural language inference: A case study on the CommitmentBank

Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without observing the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, leading to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement. more »

Award ID(s):: 1845122

PAR ID:: 10158557

Author(s) / Creator(s):: Jiang, Nanjiang; de Marneffe, Marie-Catherine

Date Published:: 2019-01-01

Journal Name:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Page Range / eLocation ID:: 6085 to 6090

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/D19-1630

More Like this