Neural Machine Translation of Text from Non-Native Speakers

Anastasopoulos, Antonios; Lui, Alison; Nguyen, Toan Q.; Chiang, David

Citation Details

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.0 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors. more »

Award ID(s):: 1761548

PAR ID:: 10105104

Author(s) / Creator(s):: Anastasopoulos, Antonios; Lui, Alison; Nguyen, Toan Q.; Chiang, David

Date Published:: 2019-06-03

Journal Name:: Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this