A physical blocks world, despite its relative simplicity, requires (in fully interactive form) a rich set of functional capabilities, ranging from vision to natural language understanding. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialogue manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.
more »
« less
Registering historical context in a spoken dialogue system for spatial question answering in a physical blocks world
Task-oriented dialogue-based spatial reasoning systems need to maintain history of the world/discourse states in order to convey that the dialogue agent is mentally present and engaged with the task, as well as to be able to refer to earlier states, which may be crucial in collaborative planning (e.g., for diagnosing a past misstep). We approach the problem of spatial memory in a multi-modal spoken dialogue system capable of answering questions about interaction history in a physical blocks world setting. We employ a pipeline consisting of a vision system, speech I/O mediated by an animated avatar, a dialogue system that robustly interprets queries, and a constraint solver that derives answers based on 3D spatial modelling. The contributions of this work include a semantic parser competent in this domain and a symbolic dialogue con- text allowing for interpreting and answering free-form historical questions using world and discourse history.
more »
« less
- Award ID(s):
- 1940981
- PAR ID:
- 10299987
- Date Published:
- Journal Name:
- 23rd Int. Conf. on Text, Speech and Dialogue (TSD 2020)
- Page Range / eLocation ID:
- 487-494
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Planning-based narrative generation is effective at producing stories with a logically-sound flow of events, but it can be limiting due to the rigidity of its constraints and the high burden on the domain author to define story-world objects, initial states, and author and character goals. Giving the system the freedom to add objects and events to the story-world history arbitrarily can improve variety and reduce authorial burden, but risks leading to stories that seem jarringly contrived to the audience. I propose to use question-answering as the antidote to contrivance in a highly-generative interactive narrative system: By modeling the player's beliefs about the story world, inferring the implicit questions the player may be asking through their interactions, and answering those questions in a way consistent with the player's prior knowledge, a system could focus on creating cohesion in the ways that matter most to the player while accepting a degree of contrivance in the details that the player is likely to overlook.more » « less
-
While there has been substantial progress in text comprehension through simple factoid question answering, more holistic comprehension of a discourse still presents a major challenge (Dunietz et al., 2020). Someone critically reflecting on a text as they read it will pose curiosity-driven, often open-ended questions, which reflect deep understanding of the content and require complex reasoning to answer (Ko et al., 2020; Westera et al., 2020). A key challenge in building and evaluating models for this type of discourse comprehension is the lack of annotated data, especially since collecting answers to such questions requires high cognitive load for annotators. This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents, viewing these questions through the lens of discourse. The resulting corpus, DCQA (Discourse Comprehension by Question Answering), captures both discourse and semantic links between sentences in the form of free-form, open-ended questions. On an evaluation set that we annotated on questions from Ko et al. (2020), we show that DCQA provides valuable supervision for answering open-ended questions. We additionally design pre-training methods utilizing existing question-answering resources, and use synthetic data to accommodate unanswerable questions.more » « less
-
Howes, Christine; Dobnik, Simon; Breitholtz, Ellen; Chatzikyriakidis, Stergios (Ed.)As AI reaches wider adoption, designing systems that are explainable and interpretable be- comes a critical necessity. In particular, when it comes to dialogue systems, their reasoning must be transparent and must comply with human intuitions in order for them to be inte- grated seamlessly into day-to-day collaborative human-machine activities. Here, we de- scribe our ongoing work on a (general purpose) dialogue system equipped with a spatial specialist with explanatory capabilities. We applied this system to a particular task of char- acterizing spatial configurations of blocks in a simple physical Blocks World (BW) domain using natural locative expressions, as well as generating justifications for the proposed spa- tial descriptions by indicating the factors that the system used to arrive at a particular conclu- sion.more » « less
-
This paper compares methods to select data for annotation in order to improve a classifier used in a question-answering dialogue system. With a classifier trained on 1,500 questions, adding 300 training questions on which the classifier is least confident results in consistently improved performance, whereas adding 300 arbitrarily selected training questions does not yield consistent improvement, and sometimes even degrades performance. The paper uses a new method for comparative evaluation of classifiers for dialogue, which scores each classifier based on the number of appropriate responses retrieved.more » « less
An official website of the United States government

