Search for: All records

Creators/Authors contains: "Lam, Monica"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation

Moradshahi, Mehrad; Semnani, Sina J; Lam, Monica S. (May 2023, Proceedings of the conference Association for Computational Linguistics European Chapter Conference)

Task-oriented Dialogue (ToD) agents are mostly limited to a few widely-spoken languages, mainly due to the high cost of acquiring training data for each language. Existing low-cost approaches that rely on cross-lingual embeddings or naive machine translation sacrifice a lot of accuracy for data efficiency, and largely fail in creating a usable dialogue agent. We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent in another target language that has no training data (i.e. zero-shot) or a small training set (i.e. fewshot). Unlike most prior work in cross-lingual ToD that only focuses on Dialogue State Tracking (DST), we build an end-to-end agent. We show that our approach closes the accuracy gap between few-shot and existing fullshot methods for ToD agents. We achieve this by (1) improving the dialogue data representation, (2) improving entity-aware machine translation, and (3) automatic filtering of noisy translations. We evaluate our approach on the recent bilingual dialogue dataset BiToD. In Chinese to English transfer, in the zero-shot setting, our method achieves 46.7% and 22.0% in Task Success Rate (TSR) and Dialogue Success Rate (DSR) respectively. In the few-shot setting where 10% of the data in the target language is used, we improve the state-of-the-art by 15.2% and 14.0%, coming within 5% of full-shot training.
more » « less
Full Text Available
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Moradshahi, Mehrad; Tsai, Victoria; Campagna, Giovanni; Lam, Monica S. (May 2023, Proceedings of the conference Association for Computational Linguistics European Chapter Conference)

Robust state tracking for task-oriented dialogue systems currently remains restricted to a few popular languages. This paper shows that given a large-scale dialogue data set in one language, we can automatically produce an effective semantic parser for other languages using machine translation. We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values and eliminate costly human supervision used in previous benchmarks. We also propose a new contextual semantic parsing model, which encodes the formal slots and values, and only the last agent and user utterances. We show that the succinct representation reduces the compounding effect of translation errors, without harming the accuracy in practice. We evaluate our approach on several dialogue state tracking benchmarks. On RiSAWOZ, CrossWOZ, CrossWOZ-EN, and MultiWOZ-ZH datasets we improve the state of the art by 11%, 17%, 20%, and 0.3% in joint goal accuracy. We present a comprehensive error analysis for all three datasets showing erroneous annotations can lead to misguided judgments on the quality of the model. Finally, we present RiSAWOZ English and German datasets, created using our translation methodology. On these datasets, accuracy is within 11% of the original showing that high-accuracy multilingual dialogue datasets are possible without relying on expensive human annotations. We release our datasets and software open source.
more » « less
Full Text Available
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Moradshahi, Mehrad; Shen, Tianhao; Bali, Kalika; Choudhury, Monojit; de Chalendar, Gaël; Goel, Anmol; Kim, Sungkyun; Kodali, Prashant; Kumaraguru, Ponnurangam; Semmar, Nasredine; et al (July 2023, Findings of the Association for Computational Linguistics (ACL), Toronto, Canada, 2023)

Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English- Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset,
more » « less
Full Text Available
A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise ThingTalk Representation

https://doi.org/10.18653/v1/2022.findings-acl.317

Campagna, Giovanni; Semnani, Sina; Kearns, Ryan; Koba Sato, Lucas Jun; Xu, Silei; Lam, Monica (May 2022, Findings of the Association for Computational Linguistics: ACL 2022)

Previous attempts to build effective semantic parsers for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set. Approaches based only on dialogue synthesis are insufficient, as dialogues generated from state-machine based models are poor approximations of real-life conversations. Furthermore, previously proposed dialogue state representations are ambiguous and lack the precision necessary for building an effective agent.This paper proposes a new dialogue representation and a sample-efficient methodology that can predict precise dialogue states in WOZ conversations. We extended the ThingTalk representation to capture all information an agent needs to respond properly. Our training strategy is sample-efficient: we combine (1) few-shot data sparsely sampling the full dialogue space and (2) synthesized data covering a subset space of dialogues generated by a succinct state-based dialogue model. The completeness of the extended ThingTalk language is demonstrated with a fully operational agent, which is also used in training data synthesis. We demonstrate the effectiveness of our methodology on MultiWOZ 3.0, a reannotation of the MultiWOZ 2.1 dataset in ThingTalk. ThingTalk can represent 98% of the test turns, while the simulator can emulate 85% of the validation set. We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on the reannotated test set.
more » « less
Full Text Available
HybridTrak: Adding Full-Body Tracking to VR Using an Off-the-Shelf Webcam

https://doi.org/10.1145/3491102.3502045

Yang, Jackie; Chen, Tuochao; Qin, Fang; Lam, Monica S.; Landay, James A. (April 2022, CHI '22: CHI Conference on Human Factors in Computing Systems)

Full-body tracking in virtual reality improves presence, allows interaction via body postures, and facilitates better social expression among users. However, full-body tracking systems today require a complex setup fixed to the environment (e.g., multiple lighthouses/cameras) and a laborious calibration process, which goes against the desire to make VR systems more portable and integrated. We present HybridTrak, which provides accurate, real-time full-body tracking by augmenting inside-out1 upper-body VR tracking systems with a single external off-the-shelf RGB web camera. HybridTrak uses a full-neural solution to convert and transform users’ 2D full-body poses from the webcam to 3D poses leveraging the inside-out upper-body tracking data. We showed HybridTrak is more accurate than RGB or depth-based tracking methods on the MPI-INF-3DHP dataset. We also tested HybridTrak in the popular VRChat app and showed that body postures presented by HybridTrak are more distinguishable and more natural than a solution using an RGBD camera.
more » « less
Full Text Available
DIY assistant: a multi-modal end-user programmable virtual assistant

https://doi.org/10.1145/3453483.3454046

Fischer, Michael H.; Campagna, Giovanni; Choi, Euirim; Lam, Monica S. (June 2021, PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation)

While Alexa can perform over 100,000 skills, its capability covers only a fraction of what is possible on the web. Individuals need and want to automate a long tail of web-based tasks which often involve visiting different websites and require programming concepts such as function composition, conditional, and iterative evaluation. This paper presents DIYA (Do-It-Yourself Assistant), a new system that empowers users to create personalized web-based virtual assistant skills that require the full generality of composable control constructs, without having to learn a formal programming language. With DIYA, the user demonstrates their task of interest in the browser and issues a few simple voice commands, such as naming the skills and adding conditions on the action. DIYA turns these multi-modal specifications into voice-invocable skills written in the ThingTalk 2.0 programming language we designed for this purpose. DIYA is a prototype that works in the Chrome browser. Our user studies show that 81% of the proposed routines can be expressed using DIYA. DIYA is easy to learn, and 80% of users surveyed find DIYA useful.
more » « less
Full Text Available
Grounding Open-Domain Instructions to Automate Web Support Tasks

Xu, Nancy: Masling; Du, Michael; Campagna, Giovanni; Heck, Larry; Landay, James; Lam, Monica S. (June 2021, 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics)

Grounding natural language instructions on the web to perform previously unseen tasks enables accessibility and automation. We introduce a task and dataset to train AI agents from open-domain, step-by-step instructions originally written for people. We build RUSS (Rapid Universal Support Service) to tackle this problem. RUSS consists of two models: First, a BERT-LSTM with pointers parses instructions to ThingTalk, a domain-specific language we design for grounding natural language on the web. Then, a grounding model retrieves the unique IDs of any webpage elements requested in ThingTalk. RUSS may interact with the user through a dialogue (e.g. ask for an address) or execute a web operation (e.g. click a button) inside the web runtime. To augment training, we synthesize natural language instructions mapped to ThingTalk. Our dataset consists of 80 different customer service problems from help websites, with a total of 741 step-by-step instructions and their corresponding actions. RUSS achieves 76.7% end-to-end accuracy predicting agent actions from single instructions. It outperforms state-of-the-art models that directly map instructions to actions without ThingTalk. Our user study shows that RUSS is preferred by actual users over web navigation.
more » « less
Full Text Available
AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

https://doi.org/10.18653/v1/2020.emnlp-main.31

Xu, Silei; Semnani, Sina; Campagna, Giovanni; Lam, Monica (November 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))
null (Ed.)
We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.
more » « less
Full Text Available
Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web

https://doi.org/10.1145/3340531.3411974

Xu, Silei; Campagna, Giovanni; Li, Jian; Lam, Monica S. (October 2020, CIKM '20: The 29th ACM International Conference on Information and Knowledge Management)
null (Ed.)
Building a question-answering agent currently requires large annotated datasets, which are prohibitively expensive. This paper proposes Schema2QA, an open-source toolkit that can generate a Q&A system from a database schema augmented with a few annotations for each field. The key concept is to cover the space of possible compound queries on the database with a large number of in-domain questions synthesized with the help of a corpus of generic query templates. The synthesized data and a small paraphrase set are used to train a novel neural network based on the BERT pretrained model. We use Schema2QA to generate Q&A systems for five this http URL domains, restaurants, people, movies, books and music, and obtain an overall accuracy between 64% and 75% on crowdsourced questions for these domains. Once annotations and paraphrases are obtained for a this http URL schema, no additional manual effort is needed to create a Q&A agent for any website that uses the same schema. Furthermore, we demonstrate that learning can be transferred from the restaurant to the hotel domain, obtaining a 64% accuracy on crowdsourced questions with no manual effort. Schema2QA achieves an accuracy of 60% on popular restaurant questions that can be answered using this http URL. Its performance is comparable to Google Assistant, 7% lower than Siri, and 15% higher than Alexa. It outperforms all these assistants by at least 18% on more complex, long-tail questions.
more » « less
Full Text Available
DoThisHere: Multimodal Interaction to Improve Cross-Application Tasks on Mobile Devices

https://doi.org/10.1145/3379337.3415841

Yang, Jackie; Lam, Monica S.; Landay, James A. (October 2020, UIST '20: The 33rd Annual ACM Symposium on User Interface Software and Technology)
null (Ed.)
Many computing tasks, such as comparison shopping, two-factor authentication, and checking movie reviews, require using multiple apps together. On large screens, "windows, icons, menus, pointer" (WIMP) graphical user interfaces (GUIs) support easy sharing of content and context between multiple apps. So, it is straightforward to see the content from one application and write something relevant in another application, such as looking at the map around a place and typing walking instructions into an email. However, although today's smartphones also use GUIs, they have small screens and limited windowing support, making it hard to switch contexts and exchange data between apps. We introduce DoThisHere, a multimodal interaction technique that streamlines cross-app tasks and reduces the burden these tasks impose on users. Users can use voice to refer to information or app features that are off-screen and touch to specify where the relevant information should be inserted or is displayed. With DoThisHere, users can access information from or carry information to other apps with less context switching. We conducted a survey to find out what cross-app tasks people are currently performing or wish to perform on their smartphones. Among the 125 tasks that we collected from 75 participants, we found that 59 of these tasks are not well supported currently. DoThisHere is helpful in completing 95% of these unsupported tasks. A user study, where users are shown the list of supported voice commands when performing a representative sample of such tasks, suggests that DoThisHere may reduce expert users' cognitive load; the Query action, in particular, can help users reduce task completion time.
more » « less
Full Text Available

« Prev Next »