Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation. However, these represent a significant domain shift from existing entailment datasets, and models underperform as a result. We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim, and a minimal subset of evidence sentences that support each subclaim. To support this, we propose an automatic claim decomposition strategy using GPT-3.5 which we show is also effective at improving entailment models’ performance on multiple datasets at test time. Finally, we show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address. 
                        more » 
                        « less   
                    
                            
                            Fool Me Twice: Entailment from Wikipedia Gamification
                        
                    
    
            We release FOOLMETWICE (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game. Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved using “shortcuts” compared to other entailment datasets. Players are presented with two tasks. The first task asks the player to write a plausible claim based on the evidence from a Wikipedia page. The second one shows two plausible claims written by other players, one of which is false, and the goal is to identify it before the time runs out. Players “pay” to see clues retrieved from the evidence pool: the more evidence the player needs, the harder the claim. Game-play between motivated players leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, and results in higher quality data for the entailment and evidence retrieval tasks. We open source the dataset and game code. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1822494
- PAR ID:
- 10309825
- Date Published:
- Journal Name:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we propose Task-Adversarial co-Generative Nets (TAGN) for learning from multiple tasks. It aims to address the two fundamental issues of multi-task learning, i.e., domain shift and limited labeled data, in a principled way. To this end, TAGN first learns the task-invariant representations of features to bridge the domain shift among tasks. Based on the task-invariant features, TAGN generates the plausible examples for each task to tackle the data scarcity issue. In TAGN, we leverage multiple game players to gradually improve the quality of the co-generation of features and examples by using an adversarial strategy. It simultaneously learns the marginal distribution of task-invariant features across different tasks and the joint distributions of examples with labels for each task. The theoretical study shows the desired results: at the equilibrium point of the multi-player game, the feature extractor exactly produces the task-invariant features for different tasks, while both the generator and the classifier perfectly replicate the joint distribution for each task. The experimental results on the benchmark data sets demonstrate the effectiveness of the proposed approach.more » « less
- 
            Claim verification is challenging because it requires first to find textual evidence and then apply claim-evidence entailment to verify a claim. Previous works evaluate the entailment step based on the retrieved evidence, whereas we hypothesize that the entailment prediction can provide useful signals for evidence retrieval, in the sense that if a sentence supports or refutes a claim, the sentence must be relevant. We propose a novel model that uses the entailment score to express the relevancy. Our experiments verify that leveraging entailment prediction improves ranking multiple pieces of evidence.more » « less
- 
            null (Ed.)We study the following problem, which to our knowledge has been addressed only partially in the literature and not in full generality. An agent observes two players play a zero-sum game that is known to the players but not the agent. The agent observes the actions and state transitions of their game play, but not rewards. The players may play either op-timally (according to some Nash equilibrium) or according to any other solution concept, such as a quantal response equilibrium. Following these observations, the agent must recommend a policy for one player, say Player 1. The goal is to recommend a policy that is minimally exploitable un-der the true, but unknown, game. We take a Bayesian ap-proach. We establish a likelihood function based on obser-vations and the specified solution concept. We then propose an approach based on Markov chain Monte Carlo (MCMC), which allows us to approximately sample games from the agent’s posterior belief distribution. Once we have a batch of independent samples from the posterior, we use linear pro-gramming and backward induction to compute a policy for Player 1 that minimizes the sum of exploitabilities over these games. This approximates the policy that minimizes the ex-pected exploitability under the full distribution. Our approach is also capable of handling counterfactuals, where known modifications are applied to the unknown game. We show that our Bayesian MCMC-based technique outperforms two other techniques—one based on the equilibrium policy of the maximum-probability game and the other based on imitation of observed behavior—on all the tested stochastic game envi-ronments.more » « less
- 
            Understanding players' mental models are crucial for game designers who wish to successfully integrate player-AI interactions into their game. However, game designers face the difficult challenge of anticipating how players model these AI agents during gameplay and how they may change their mental models with experience. In this work, we conduct a qualitative study to examine how a pair of players develop mental models of an adversarial AI player during gameplay in the multiplayer drawing game iNNk. We conducted ten gameplay sessions in which two players (n = 20, 10 pairs) worked together to defeat an AI player. As a result of our analysis, we uncovered two dominant dimensions that describe players' mental model development (i.e., focus and style). The first dimension describes the focus of development which refers to what players pay attention to for the development of their mental model (i.e., top-down vs. bottom-up focus). The second dimension describes the differences in the style of development, which refers to how players integrate new information into their mental model (i.e., systematic vs. reactive style). In our preliminary framework, we further note how players process a change when a discrepancy occurs, which we observed occur through comparisons (i.e., compare to other systems, compare to gameplay, compare to self). We offer these results as a preliminary framework for player mental model development to help game designers anticipate how different players may model adversarial AI players during gameplay.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    