NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Investigating Reasons for Disagreement in Natural Language Inference

https://doi.org/10.1162/tacl_a_00523

Jiang, Nan-Jiang; de Marneffe, Marie-Catherine (December 2022, Transactions of the Association for Computational Linguistics)

Abstract We investigate how disagreement in natural language inference (NLI) annotation arises. We developed a taxonomy of disagreement sources with 10 categories spanning 3 high- level classes. We found that some disagreements are due to uncertainty in the sentence meaning, others to annotator biases and task artifacts, leading to different interpretations of the label distribution. We explore two modeling approaches for detecting items with potential disagreement: a 4-way classification with a “Complicated” label in addition to the three standard NLI labels, and a multilabel classification approach. We found that the multilabel classification is more expressive and gives better recall of the possible interpretations in the data.
more » « less
Full Text Available
Ecologically Valid Explanations for Label Variation in NLI

https://doi.org/10.18653/v1/2023.findings-emnlp.712

Jiang, Nan-Jiang; Tan, Chenhao; de Marneffe, Marie-Catherine (November 2023, Findings of EMNLP)

Full Text Available
Identifying inherent disagreement in natural language inference

https://doi.org/10.18653/v1/2021.naacl-main.390

Zhang, Xinliang Frederick; de Marneffe, Marie-Catherine (January 2021, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)
null (Ed.)
Natural language inference (NLI) is the task of determining whether a piece of text is entailed, contradicted by or unrelated to another piece of text. In this paper, we investigate how to tease systematic inferences (i.e., items for which people agree on the NLI label) apart from disagreement items (i.e., items which lead to different annotations), which most prior work has overlooked. To distinguish systematic inferences from disagreement items, we propose Artificial Annotators (AAs) to simulate the uncertainty in the annotation process by capturing the modes in annotations. Results on the CommitmentBank, a corpus of naturally occurring discourses in English, confirm that our approach performs statistically significantly better than all baselines. We further show that AAs learn linguistic patterns and context-dependent reasoning.
more » « less
Full Text Available
He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

https://doi.org/10.1162/tacl_a_00414

Jiang, Nanjiang; de Marneffe, Marie-Catherine (January 2021, Transactions of the Association for Computational Linguistics)

Abstract We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction.
more » « less
Full Text Available
The prosody of presupposition projection in naturally-occurring utterances

https://doi.org/10.18148/sub/2020.v24i2.884

Mahler, Taylor; de Marneffe, Marie-Catherine; Lai, Catherine (January 2020, Proceedings of Sinn und Bedeutung 24)
null (Ed.)
In experimental studies, prosodically-marked pragmatic focus has been found to influence the projection of factive presuppositions of utterances like "these parents didn’t know the kid was gone" (Cummins and Rohde, 2015; Tonhauser, 2016; Djarv and Bacovcin, 2017), supporting question-based analyses of projection (i.a., Abrusan, 2011; Abrusan, 2016; Simonset al., 2017; Beaver et al., 2017). However, no prior work has explored whether this effect extends to naturally-occurring utterances. In a large set of naturally-occurring utterances, we find that prosodically-marked focus influences projection in utterances with factive embeddingpredicates, but not those with non-factive predicates. We argue that our findings support an account where lexical semantics of the predicate contributes to projection to the extent that they admit QUD alternatives that can be assumed to entail the content of the complement.
more » « less
Full Text Available
Evaluating BERT for natural language inference: A case study on the CommitmentBank

https://doi.org/10.18653/v1/D19-1630

Jiang, Nanjiang; de Marneffe, Marie-Catherine (January 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without observing the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, leading to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.
more » « less
Full Text Available

Search for: All records