Explainable Video Entailment with Grounded Visual Evidence

Chen, Junwen; Kong, Yu

doi:10.1109/ICCV48922.2021.00203

Citation Details

Explainable Video Entailment with Grounded Visual Evidence

Video entailment aims at determining if a hypothesis textual statement is entailed or contradicted by a premise video. The main challenge of video entailment is that it requires fine-grained reasoning to understand the complex and long story-based videos. To this end, we propose to incorporate visual grounding to the entailment by explicitly linking the entities described in the statement to the evidence in the video. If the entities are grounded in the video, we enhance the entailment judgment by focusing on the frames where the entities occur. Besides, in the entailment dataset, the entailed/contradictory (also named as real/fake) statements are formed in pairs with subtle discrepancy, which allows an add-on explanation module to predict which words or phrases make the statement contradictory to the video and regularize the training of the entailment judgment. Experimental results demonstrate that our approach outperforms the state-of-the-art methods. more »

Award ID(s):: 1949694

PAR ID:: 10337269

Author(s) / Creator(s):: Chen, Junwen; Kong, Yu

Date Published:: 2021-10-01

Journal Name:: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Page Range / eLocation ID:: 2001 to 2010

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICCV48922.2021.00203

More Like this