Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Briakou, Eleftheria; Carpuat, Marine

doi:10.18653/v1/2020.emnlp-main.121

Citation Details

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Detecting fine-grained differences in content conveyed in different languages matters for cross-lingual NLP and multilingual corpora analysis, but it is a challenging machine learning problem since annotation is expensive and hard to scale. This work improves the prediction and annotation of fine-grained semantic divergences. We introduce a training strategy for multilingual BERT models by learning to rank synthetic divergent examples of varying granularity. We evaluate our models on the Rationalized English-French Semantic Divergences, a new dataset released with this work, consisting of English-French sentence-pairs annotated with semantic divergence classes and token-level rationales. Learning to rank helps detect fine-grained sentence-level divergences more accurately than a strong sentence-level similarity model, while token-level predictions have the potential of further distinguishing between coarse and fine-grained divergences. more »

Award ID(s):: 1750695

PAR ID:: 10206137

Author(s) / Creator(s):: Briakou, Eleftheria; Carpuat, Marine

Date Published:: 2020-11-01

Journal Name:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Page Range / eLocation ID:: 1563 to 1580

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.emnlp-main.121

More Like this