skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Can Synthetic Translations Improve Bitext Quality?
Synthetic translations have been used for a wide range of NLP tasks primarily as a means of data augmentation. This work explores, instead, how synthetic translations can be used to revise potentially imperfect reference translations in mined bitext. We find that synthetic samples can improve bitext quality without any additional bilingual supervision when they replace the originals based on a semantic equivalence classifier that helps mitigate NMT noise. The improved quality of the revised bitext is confirmed intrinsically via human evaluation and extrinsically through bilingual induction and MT tasks.  more » « less
Award ID(s):
1750695
PAR ID:
10399217
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Page Range / eLocation ID:
4753 to 4766
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Neural Machine Translation (NMT) performs training of a neural network employing an encoder-decoder architecture. However, the quality of the neural-based translations predominantly depends on the availability of a large amount of bilingual training dataset. In this paper, we explore the performance of translations predicted by attention-based NMT systems for Spanish to Persian low-resource language pairs. We analyze the errors of NMT systems that occur in the Persian language and provide an in-depth comparison of the performance of the system based on variations in sentence length and size of the training dataset. We evaluate our translation results using BLEU and human evaluation measures based on the adequacy, fluency, and overall rating. 
    more » « less
  2. null (Ed.)
    This paper describes a systematic study of an approach to Farsi-Spanish low-resource Neural Machine Translation (NMT) that leverages monolingual data for joint learning of forward and backward translation models. As is standard for NMT systems, the training process begins using two pre-trained translation models that are iteratively updated by decreasing translation costs. In each iteration, either translation model is used to translate monolingual texts from one language to another, to generate synthetic datasets for the other translation model. Two new translation models are then learned from bilingual data along with the synthetic texts. The key distinguishing feature between our approach and standard NMT is an iterative learning process that improves the performance of both translation models, simultaneously producing a higher-quality synthetic training dataset upon each iteration. Our empirical results demonstrate that this approach outperforms baselines. 
    more » « less
  3. Bateiha, S.; Cobbs, G. (Ed.)
    This study highlights parents’ linguistic capital and how they use specific languaging practices to facilitate their child’s learning. One bilingual family used multiple languages to facilitate their son’s learning through two mathematical tasks. Using Dominguez’ conceptual framework of bilingualism, we analyzed these conversations to look for natural units of communication and its relation towards their problem solving goals. The data shows the family would switch from English to Spanish to help their child surpass several barriers during their mathematical activities. Leveraging bilingual languaging practices can counter the deficit lens with which minoritized students are typically viewed. 
    more » « less
  4. Abstract Recent findings demonstrate a bilingual advantage for voice processing in children, but the mechanism supporting this advantage is unknown. Here we examined whether a bilingual advantage for voice processing is observed in adults and, if so, if it reflects enhanced pitch perception or inhibitory control. Voice processing was assessed for monolingual and bilingual adults using an associative learning identification task and a discrimination task in English (a familiar language) and French (an unfamiliar language). Participants also completed pitch perception, flanker, and auditory Stroop tasks. Voice processing was improved for the familiar compared to the unfamiliar language and reflected individual differences in pitch perception (both tasks) and inhibitory control (identification task). However, no bilingual advantage was observed for either voice task, suggesting that the bilingual advantage for voice processing becomes attenuated during maturation, with performance in adulthood reflecting knowledge of linguistic structure in addition to general auditory and inhibitory control abilities. 
    more » « less
  5. Abstract The effects of bilingual language experience on cognitive control are still debated. A recent proposal is that being bilingual enhances attentional control. This is based on studies showing smaller effects of the nature of the preceding trial on the current trial in bilinguals (Grundy et al., 2017). However, performance on such tasks can also be accounted for by lower-level processes such as the binding and unbinding of stimulus and response features. The current study used a Partial Repetition Cost paradigm to explicitly test whether language experience can affect such processes. Results showed that bi- and monolinguals did not differ in their responses when the stimulus features were task-relevant. However, the bilinguals showed smaller partial repetition costs when the features were task-irrelevant. These findings suggest that language experience does not affect lower-level processes, and supports the view that bilinguals exhibit enhanced attentional disengagement. 
    more » « less