Task-oriented dialogue research has mainly focused
on a few popular languages like English
and Chinese, due to the high dataset creation
cost for a new language. To reduce the cost,
we apply manual editing to automatically translated
data. We create a new multilingual benchmark,
X-RiSAWOZ, by translating the Chinese
RiSAWOZ to 4 languages: English, French,
Hindi, Korean; and a code-mixed English-
Hindi language. X-RiSAWOZ has more than
18,000 human-verified dialogue utterances for
each language, and unlike most multilingual
prior work, is an end-to-end dataset for building
fully-functioning agents.
The many difficulties we encountered in creating
X-RiSAWOZ led us to develop a toolset
to accelerate the post-editing of a new language
dataset after translation. This toolset improves
machine translation with a hybrid entity
alignment technique that combines neural with
dictionary-based methods, along with many automated
and semi-automated validation checks.
We establish strong baselines for X-RiSAWOZ
by training dialogue agents in the zero- and
few-shot settings where limited gold data is
available in the target language. Our results
suggest that our translation and post-editing
methodology and toolset can be used to create
new high-quality multilingual dialogue agents
cost-effectively. Our dataset,
more »
« less
Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation
Task-oriented Dialogue (ToD) agents are
mostly limited to a few widely-spoken languages,
mainly due to the high cost of acquiring
training data for each language. Existing
low-cost approaches that rely on cross-lingual
embeddings or naive machine translation sacrifice
a lot of accuracy for data efficiency, and
largely fail in creating a usable dialogue agent.
We propose automatic methods that use ToD
training data in a source language to build a
high-quality functioning dialogue agent in another
target language that has no training data
(i.e. zero-shot) or a small training set (i.e. fewshot).
Unlike most prior work in cross-lingual
ToD that only focuses on Dialogue State Tracking
(DST), we build an end-to-end agent.
We show that our approach closes the accuracy
gap between few-shot and existing fullshot
methods for ToD agents. We achieve
this by (1) improving the dialogue data representation,
(2) improving entity-aware machine
translation, and (3) automatic filtering of noisy
translations.
We evaluate our approach on the recent bilingual
dialogue dataset BiToD. In Chinese to
English transfer, in the zero-shot setting, our
method achieves 46.7% and 22.0% in Task
Success Rate (TSR) and Dialogue Success
Rate (DSR) respectively. In the few-shot setting
where 10% of the data in the target language
is used, we improve the state-of-the-art
by 15.2% and 14.0%, coming within 5% of
full-shot training.
more »
« less
- Award ID(s):
- 1900638
- PAR ID:
- 10427011
- Date Published:
- Journal Name:
- Proceedings of the conference Association for Computational Linguistics European Chapter Conference
- ISSN:
- 1525-2450
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.more » « less
-
In this work, we induce character-level noise in various forms when fine-tuning BERT to enable zero-shot cross-lingual transfer to unseen dialects and languages. We fine-tune BERT on three sentence-level classification tasks and evaluate our approach on an assortment of unseen dialects and languages. We find that character-level noise can be an extremely effective agent of cross-lingual transfer under certain conditions, while it is not as helpful in others. Specifically, we explore these differences in terms of the nature of the task and the relationships between source and target languages, finding that introduction of character-level noise during fine-tuning is particularly helpful when a task draws on surface level cues and the source-target cross-lingual pair has a relatively high lexical overlap with shorter (i.e., less meaningful) unseen tokens on average.more » « less
-
Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language1. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging.more » « less
-
The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better visual dog classifier by reading about dogs and listening to them bark. To do so, we exploit the fact that recent multimodal foundation models such as CLIP are inherently cross-modal, mapping different modalities to the same representation space. Specifically, we propose a simple cross-modal adaptation approach that learns from few-shot examples spanning different modalities. By repurposing class names as additional one-shot training samples, we achieve SOTA results with an embarrassingly simple linear classifier for vision-language adaptation. Furthermore, we show that our approach can benefit existing methods such as prefix tuning, adapters, and classifier ensembling. Finally, to explore other modalities beyond vision and language, we construct the first (to our knowledge) audiovisual few-shot benchmark and use cross-modal training to improve the performance of both image and audio classification.more » « less