Translating culture-related content is vital for effective cross-cultural communication. However, many culture-specific items (CSIs) often lack viable translations across languages, making it challenging to collect high-quality, diverse parallel corpora with CSI annotations. This difficulty hinders the analysis of cultural awareness of machine translation (MT) systems, including traditional neural MT and the emerging MT paradigm using large language models (LLM). To address this gap, we introduce a novel parallel corpus, enriched with CSI annotations in 6 language pairs for investigating Culturally-Aware Machine Translation---CAMT. Furthermore, we design two evaluation metrics to assess CSI translations, focusing on their pragmatic translation quality. Our findings show the superior ability of LLMs over neural MTs in leveraging external cultural knowledge for translating CSIs, especially those lacking translations in the target culture.
more »
« less
Cultural and Geographical Influences on Image Translatability of Words across Languages
Neural Machine Translation (NMT) models have been observed to produce poor translations when there are few/no parallel sentences to train the models. In the absence of parallel data, several approaches have turned to the use of images to learn translations. Since images of words, e.g., horse may be unchanged across languages, translations can be identified via images associated with words in different languages that have a high degree of visual similarity. However, translating via images has been shown to improve upon text-only models only marginally. To better understand when images are useful for translation, we study image translatability of words, which we define as the translatability of words via images, by measuring intra- and inter-cluster similarities of image representations of words that are translations of each other. We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i.e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity. In addition, in line with previous works that show images help more in translating concrete words, we found that concrete words have improved image translatability compared to abstract ones.
more »
« less
- Award ID(s):
- 1838193
- PAR ID:
- 10291480
- Date Published:
- Journal Name:
- The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Page Range / eLocation ID:
- 198 to 209
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
People are able to describe images using thousands of languages, but languages share only one visual world. The aim of this work is to use the learned intermediate visual representations from a deep convolutional neural network to transfer information across languages for which paired data is not available in any form. Our work proposes using backpropagation-based decoding coupled with transformer-based multilingual-multimodal language models in order to obtain translations between any languages used during training. We particularly show the capabilities of this approach in the translation of German-Japanese and Japanese-German sentence pairs, given a training data of images freely associated with text in English, German, and Japanese but for which no single image contains annotations in both Japanese and German. Moreover, we demonstrate that our approach is also generally useful in the multilingual image captioning task when sentences in a second language are available at test time. The results of our method also compare favorably in the Multi30k dataset against recently proposed methods that are also aiming to leverage images as an intermediate source of translations.more » « less
-
null (Ed.)Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods.more » « less
-
null (Ed.)Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods.more » « less
-
Abstract What makes a word easy to learn? Early‐learned words are frequent and tend to name concrete referents. But words typically do not occur in isolation. Some words are predictable from their contexts; others are less so. Here, we investigate whether predictability relates to when children start producing different words (age of acquisition; AoA). We operationalized predictability in terms of a word's surprisal in child‐directed speech, computed using n‐gram and long‐short‐term‐memory (LSTM) language models. Predictability derived from LSTMs was generally a better predictor than predictability derived from n‐gram models. Across five languages, average surprisal was positively correlated with the AoA of predicates and function words but not nouns. Controlling for concreteness and word frequency, more predictable predicates and function words were learned earlier. Differences in predictability between languages were associated with cross‐linguistic differences in AoA: the same word (when it was a predicate) was produced earlier in languages where the word was more predictable.more » « less
An official website of the United States government

