skip to main content

Title: A spoken dialogue system for spatial question answering in a physical blocks world
A physical blocks world, despite its relative simplicity, requires (in fully interactive form) a rich set of functional capabilities, ranging from vision to natural language understanding. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialogue manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Proceedings of SigDial 2020, ISBN 978-1-952148-02-6
Page Range or eLocation-ID:
128 - 131
Sponsoring Org:
National Science Foundation
More Like this
  1. Task-oriented dialogue-based spatial reasoning systems need to maintain history of the world/discourse states in order to convey that the dialogue agent is mentally present and engaged with the task, as well as to be able to refer to earlier states, which may be crucial in collaborative planning (e.g., for diagnosing a past misstep). We approach the problem of spatial memory in a multi-modal spoken dialogue system capable of answering questions about interaction history in a physical blocks world setting. We employ a pipeline consisting of a vision system, speech I/O mediated by an animated avatar, a dialogue system that robustlymore »interprets queries, and a constraint solver that derives answers based on 3D spatial modelling. The contributions of this work include a semantic parser competent in this domain and a symbolic dialogue con- text allowing for interpreting and answering free-form historical questions using world and discourse history.« less
  2. This paper seeks to illustrate the first steps in a process of adapting an existing, valid, and reliable spatial ability instrument – the Mental Cutting Test (MCT) – to assess spatial ability among blind and low vision (BLV) populations. To adapt the instrument, the team is developing three-dimensional (3-D) models of existing MCT questions such that a BLV population may perceive the test tactilely with their hands. This paper focuses on the development of the Tactile MCT (TMCT) instrument and does not report on the use of or results from the new instrument. Future work will investigate the validity andmore »reliability of the adapted instrument. Each TMCT question is created by modeling and 3-D printing the objects represented by two-dimensional pictorial drawings on the MCT. The 3-D models of 25 items of the MCT are created using a solid modeling process followed by an additive 3-D printing process. The correct answer to each MCT question is the section view defined by a plane-of-interest (POI) intersecting the figure in question. A thin plane extending from the figure identifies the POI of each problem. The possible answers were originally presented in multiple representations including 3-D printed extrusions on top of a thin plate, and two forms of tactile graphics. The 3-D printed answers are developed by a combination of acquiring accurate dimensions of the 3-D figure’s cross-section and scaling up the printed paper test. To improve this adaptation of the MCT instrument, the TMCT models and their respective multiple-choice answers will be inspected by a spatial cognition expert as well as several BLV individuals. Feedback from these individuals will provide insight into necessary revisions before the test is implemented.« less
  3. Knowledge representation and reasoning (KRR) is key to the vision of the intelligent Web. Unfortunately, wide deployment of KRR is hindered by the difficulty in specifying the requisite knowledge, which requires skills that most domain experts lack. A way around this problem could be to acquire knowledge automatically from documents. The difficulty is that, KRR requires high-precision knowledge and is sensitive even to small amounts of errors. Although most automatic information extraction systems developed for general text understandings have achieved remarkable results, their accuracy is still woefully inadequate for logical reasoning. A promising alternative is to ask the domain expertsmore »to author knowledge in Controlled Natural Language (CNL). Nonetheless, the quality of knowledge construc- tion even through CNL is still grossly inadequate, the main obstacle being the multiplicity of ways the same information can be described even in a controlled language. Our previous work addressed the problem of high accuracy knowledge authoring for KRR from CNL documents by introducing the Knowledge Au- thoring Logic Machine (KALM). This paper develops the query aspect of KALM with the aim of getting high precision answers to CNL questions against previously authored knowledge and is tolerant to linguistic variations in the queries. To make queries more expressive and easier to formulate, we propose a hybrid CNL, i.e., a CNL with elements borrowed from formal query languages. We show that KALM achieves superior accuracy in semantic parsing of such queries.« less
  4. Our visual system is fundamentally retinotopic. When viewing a stable scene, each eye movement shifts object features and locations on the retina. Thus, sensory representations must be updated, or remapped, across saccades to align presaccadic and postsaccadic inputs. The earliest remapping studies focused on anticipatory, presaccadic shifts of neuronal spatial receptive fields. Over time, it has become clear that there are multiple forms of remapping and that different forms of remapping may be mediated by different neural mechanisms. This review attempts to organize the various forms of remapping into a functional taxonomy based on experimental data and ongoing debates aboutmore »forward versus convergent remapping, presaccadic versus postsaccadic remapping, and spatial versus attentional remapping. We integrate findings from primate neurophysiological, human neuroimaging and behavioral, and computational modeling studies. We conclude by discussing persistent open questions related to remapping, with specific attention to binding of spatial and featural information during remapping and speculations about remapping's functional significance. Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see for revised estimates.« less
  5. Abstract

    Image texture, the relative spatial arrangement of intensity values in an image, encodes valuable information about the scene. As it stands, much of this potential information remains untapped. Understanding how to decipher textural details would afford another method of extracting knowledge of the physical world from images. In this work, we attempt to bridge the gap in research between quantitative texture analysis and the visual perception of textures. The impact of changes in image texture on human observer’s ability to perform signal detection and localization tasks in complex digital images is not understood. We examine this critical question bymore »studying task-based human observer performance in detecting and localizing signals in tomographic breast images. We have also investigated how these changes impact the formation of second-order image texture. We used digital breast tomosynthesis (DBT) an FDA approved tomographic X-ray breast imaging method as the modality of choice to show our preliminary results. Our human observer studies involve localization ROC (LROC) studies for low contrast mass detection in DBT. Simulated images are used as they offer the benefit of known ground truth. Our results prove that changes in system geometry or processing leads to changes in image texture magnitudes. We show that the variations in several well-known texture features estimated in digital images correlate with human observer detection–localization performance for signals embedded in them. This insight can allow efficient and practical techniques to identify the best imaging system design and algorithms or filtering tools by examining the changes in these texture features. This concept linking texture feature estimates and task based image quality assessment can be extended to several other imaging modalities and applications as well. It can also offer feedback in system and algorithm designs with a goal to improve perceptual benefits. Broader impact can be in wide array of areas including imaging system design, image processing, data science, machine learning, computer vision, perceptual and vision science. Our results also point to the caution that must be exercised in using these texture features as image-based radiomic features or as predictive markers for risk assessment as they are sensitive to system or image processing changes.

    « less