skip to main content


Title: A spoken dialogue system for spatial question answering in a physical blocks world
A physical blocks world, despite its relative simplicity, requires (in fully interactive form) a rich set of functional capabilities, ranging from vision to natural language understanding. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialogue manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.  more » « less
Award ID(s):
1940981
NSF-PAR ID:
10182250
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of SigDial 2020, ISBN 978-1-952148-02-6
Page Range / eLocation ID:
128 - 131
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Task-oriented dialogue-based spatial reasoning systems need to maintain history of the world/discourse states in order to convey that the dialogue agent is mentally present and engaged with the task, as well as to be able to refer to earlier states, which may be crucial in collaborative planning (e.g., for diagnosing a past misstep). We approach the problem of spatial memory in a multi-modal spoken dialogue system capable of answering questions about interaction history in a physical blocks world setting. We employ a pipeline consisting of a vision system, speech I/O mediated by an animated avatar, a dialogue system that robustly interprets queries, and a constraint solver that derives answers based on 3D spatial modelling. The contributions of this work include a semantic parser competent in this domain and a symbolic dialogue con- text allowing for interpreting and answering free-form historical questions using world and discourse history. 
    more » « less
  2. This paper seeks to illustrate the first steps in a process of adapting an existing, valid, and reliable spatial ability instrument – the Mental Cutting Test (MCT) – to assess spatial ability among blind and low vision (BLV) populations. To adapt the instrument, the team is developing three-dimensional (3-D) models of existing MCT questions such that a BLV population may perceive the test tactilely with their hands. This paper focuses on the development of the Tactile MCT (TMCT) instrument and does not report on the use of or results from the new instrument. Future work will investigate the validity and reliability of the adapted instrument. Each TMCT question is created by modeling and 3-D printing the objects represented by two-dimensional pictorial drawings on the MCT. The 3-D models of 25 items of the MCT are created using a solid modeling process followed by an additive 3-D printing process. The correct answer to each MCT question is the section view defined by a plane-of-interest (POI) intersecting the figure in question. A thin plane extending from the figure identifies the POI of each problem. The possible answers were originally presented in multiple representations including 3-D printed extrusions on top of a thin plate, and two forms of tactile graphics. The 3-D printed answers are developed by a combination of acquiring accurate dimensions of the 3-D figure’s cross-section and scaling up the printed paper test. To improve this adaptation of the MCT instrument, the TMCT models and their respective multiple-choice answers will be inspected by a spatial cognition expert as well as several BLV individuals. Feedback from these individuals will provide insight into necessary revisions before the test is implemented. 
    more » « less
  3. How the brain derives 3D information from inherently ambiguous visual input remains the fundamental question of human vision. The past two decades of research have addressed this question as a problem of probabilistic inference, the dominant model being maximum-likelihood estimation (MLE). This model assumes that independent depth-cue modules derive noisy but statistically accurate estimates of 3D scene parameters that are combined through a weighted average. Cue weights are adjusted based on the system representation of each module's output variability. Here I demonstrate that the MLE model fails to account for important psychophysical findings and, importantly, misinterprets the just noticeable difference, a hallmark measure of stimulus discriminability, to be an estimate of perceptual uncertainty. I propose a new theory, termed Intrinsic Constraint, which postulates that the visual system does not derive the most probable interpretation of the visual input, but rather, the most stable interpretation amid variations in viewing conditions. This goal is achieved with the Vector Sum model, which represents individual cue estimates as components of a multi-dimensional vector whose norm determines the combined output. This model accounts for the psychophysical findings cited in support of MLE, while predicting existing and new findings that contradict the MLE model. This article is part of a discussion meeting issue ‘New approaches to 3D vision’. 
    more » « less
  4. The goal of automatic resource bound analysis is to statically infer symbolic bounds on the resource consumption of the evaluation of a program. A longstanding challenge for automatic resource analysis is the inference of bounds that are functions of complex custom data structures. This article builds on type-based automatic amortized resource analysis (AARA) to address this challenge. AARA is based on the potential method of amortized analysis and reduces bound inference to standard type inference with additional linear constraint solving, even when deriving non-linear bounds. A key component of AARA is resource functions that generate the space of possible bounds for values of a given type while enjoying necessary closure properties. Existing work on AARA defined such functions for many data structures such as lists of lists but the question of whether such functions exist for arbitrary data structures remained open. This work answers this questions positively by uniformly constructing resource polynomials for algebraic data structures defined by regular recursive types. These functions are a generalization of all previously proposed polynomial resource functions and can be seen as a general notion of polynomials for values of a given recursive type. A resource type system for FPC, a core language with recursive types, demonstrates how resource polynomials can be integrated with AARA while preserving all benefits of past techniques. The article also proposes the use of new techniques useful for stating the rules of this type system and proving it sound. First, multivariate potential annotations are stated in terms of free semimodules, substantially abstracting details of the presentation of annotations and the proofs of their properties. Second, a logical relation giving semantic meaning to resource types enables a proof of soundness by a single induction on typing derivations. 
    more » « less
  5. Knowledge representation and reasoning (KRR) is key to the vision of the intelligent Web. Unfortunately, wide deployment of KRR is hindered by the difficulty in specifying the requisite knowledge, which requires skills that most domain experts lack. A way around this problem could be to acquire knowledge automatically from documents. The difficulty is that, KRR requires high-precision knowledge and is sensitive even to small amounts of errors. Although most automatic information extraction systems developed for general text understandings have achieved remarkable results, their accuracy is still woefully inadequate for logical reasoning. A promising alternative is to ask the domain experts to author knowledge in Controlled Natural Language (CNL). Nonetheless, the quality of knowledge construc- tion even through CNL is still grossly inadequate, the main obstacle being the multiplicity of ways the same information can be described even in a controlled language. Our previous work addressed the problem of high accuracy knowledge authoring for KRR from CNL documents by introducing the Knowledge Au- thoring Logic Machine (KALM). This paper develops the query aspect of KALM with the aim of getting high precision answers to CNL questions against previously authored knowledge and is tolerant to linguistic variations in the queries. To make queries more expressive and easier to formulate, we propose a hybrid CNL, i.e., a CNL with elements borrowed from formal query languages. We show that KALM achieves superior accuracy in semantic parsing of such queries. 
    more » « less