We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve SpanishEnglish ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available. Through an ablation study, we find that the pre-trained encoder (acoustic model) accounts for most of the improvement, despite the fact that the shared language in these tasks is the target language text, not the source language audio. Applying this insight, we show that pre-training on ASR helps ST even when the ASR language differs from both source and target ST languages: pre-training on French ASR also improves Spanish-English ST. Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3.5 to 7.1 BLEU.
more »
« less
Pragmatic Markers in Kwéyòl Donmnik, French, & English: Language Contact & Creole Emergence through the Lens of Powerful Little Words
Pragmatic markers (PMs) are multifunctional elements that allow language users to organize and coordinate discourse and to express their attitudes and cognitive states. This study compares the discourse-pragmatic functions and distributional features of four PMs in Kwéyòl Donmnik (konsa ‘so’, èben ‘well’, papa/Bondyé ‘papa/God’, la ‘there’) with those of their etyma in French, Kwéyòl’s lexifier ((ou) comme ça, (eh) ben, bon Dieu, là), and with their counterparts in English, the colonial source language with which it has been in contact for more than two centuries (so, well, oh my God, there). The properties of the Kwéyòl PMs are determined through a corpus analysis and are then compared to descriptions of the French and English PMs in previous studies. Each of the four Kwéyòl PM’s has functions in common with its French etymon and its English counterpart as well as its own unique functions. In addition, English so performs functions in the Kwéyòl data that are unique to Kwéyòl konsa ‘so’, suggesting that so is being integrated into Kwéyòl. This study expands the limited body of work on Kwéyòl and deepens our understanding of language contact and Creole emergence at the discourse-pragmatic level, particularly in cases involving a second, non-lexifier colonial source language.
more »
« less
- PAR ID:
- 10559591
- Publisher / Repository:
- Laboratoire Parole et Langage
- Date Published:
- Journal Name:
- Études créoles
- Volume:
- 41 | 1-2
- ISSN:
- 0708-2398
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This study examines whether second language (L2) learners' processing of an intonationally cued lexical contrast is facilitated when intonational cues signal a segmental contrast in the native language (L1). It does so by investigating Seoul Korean and French listeners' processing of intonationally cued lexical-stress contrasts in English. Neither Seoul Korean nor French has lexical stress; instead, the two languages have similar intonational systems where prominence is realized at the level of the Accentual Phrase. A critical difference between the two systems is that French has only one tonal pattern underlying the realization of the Accentual Phrase, whereas Korean has two underlying tonal patterns that depend on the laryngeal feature of the phrase-initial segment. The L and H tonal cues thus serve to distinguish segments at the lexical level in Korean but not in French; Seoul Korean listeners are thus hypothesized to outperform French listeners when processing English lexical stress realized only with (only) tonal cues (H * on the stressed syllable). Seoul Korean and French listeners completed a sequence-recall task with four-item sequences of English words that differed in intonationally cued lexical stress (experimental condition) or in word-initial segment (control condition). The results showed higher accuracy for Seoul Korean listeners than for French listeners only when processing English lexical stress, suggesting that the processing of an intonationally cued lexical contrast in the L2 is facilitated when intonational cues signal a segmental contrast in the L1. These results are interpreted within the scope of the cue-based transfer approach to L2 prosodic processing.more » « less
-
Expectation is a powerful mechanism in native-language processing. Less is known about its role in non-native language processing, especially for expectations at the discourse level. This study presents evidence from a story-continuation task, adapted from previous work with native speakers (Rohde et al., 2006), probing next-mention and coherence expectations among Japanese- and Korean-speaking learners of English. As in previous work, verbal aspect (perfective/imperfective) in a context sentence describing a transfer-of-possession event (e.g., Ron gave/was giving a towel to Patrick) modulated participants’ choices of next referents in their continuations. However, this effect was diminished in the non-native compared to the native-speaker group, despite comparable performance on an independent task assessing knowledge of verbal aspect in English, and previous evidence for significant effects of aspect on referential patterns in native Japanese and Korean processing (Ueno & Kehler, 2010; Kim et al., 2013). The two groups of speakers were equally sensitive to a cue that does not require predictive processing – the referential form of the story-continuation prompt – in that both groups were significantly more likely to establish reference to the discourse topic/Source of the transfer event for pronoun-initial continuations than for name-initial ones. Moreover, recency played a stronger role in non-native speakers’ referential choices than in those of native speakers. These results suggest that while native speakers engage in proactive discourse processing, non-native speakers are less able to do so, being sufficiently burdened by reactive processes required for information integration that they have only Reduced Ability to Generate Expectations (RAGE).more » « less
-
null (Ed.)Research in the social sciences and psychology has shown that the persuasiveness of an argument depends not only the language employed, but also on attributes of the source/communicator, the audience, and the appropriateness and strength of the argument’s claims given the pragmatic and discourse context of the argument. Among these characteristics of persuasive arguments, prior work in NLP does not explicitly investigate the effect of the pragmatic and discourse context when determining argument quality. This paper presents a new dataset to initiate the study of this aspect of argumentation: it consists of a diverse collection of arguments covering 741 controversial topics and comprising over 47,000 claims. We further propose predictive models that incorporate the pragmatic and discourse context of argumentative claims and show that they outperform models that rely only on claim-specific linguistic features for predicting the perceived impact of individual claims within a particular line of argument.more » « less
-
null (Ed.)Abstract Applied linguistic work claims that multilinguals’ non-native languages interfere with one another based on similarities in cognitive factors like proficiency or age of acquisition. Two experiments explored how trilinguals regulate control of native- and non-native-language words. Experiment 1 tested 46 Dutch–English–French trilinguals in a monitoring task. Participants decided if phonemes were present in the target language name of a picture, phonemes of non-target language translations resulted in longer response times and more false alarms compared to phonemes not present in any translation (Colomé, 2001). The second language (English) interfered more than the first (Dutch) when trilinguals monitored in their third language (French). In Experiment 2, 95 bilinguals learned an artificial language to explore the possibility that the language from which a bilingual learns a third language provides practice managing known-language interference. Language of instruction modulated results, suggesting that learning conditions may reduce interference effects previously attributed to cognitive factors.more » « less
An official website of the United States government

