Language understanding is essential for the navigation agent to follow instructions. We observe two kinds of issues in the instructions that can make the navigation task challenging: 1. The mentioned landmarks are not recognizable by the navigation agent due to the different vision abilities of the instructor and the modeled agent. 2. The mentioned landmarks are applicable to multiple targets, thus not distinctive for selecting the target among the candidate viewpoints. To deal with these issues, we design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations at each step. The translator needs to focus on the recognizable and distinctive landmarks based on the agent’s visual abilities and the observed visual environment. To achieve this goal, we create a new synthetic sub-instruction dataset and design specific tasks to train the translator and the navigation agent. We evaluate our approach on Room2Room (R2R), Room4room (R4R), and Room2Room Last (R2R-Last) datasets and achieve state-of-the-art results on multiple benchmarks.
more »
« less
Using a picture (or a thousand words) for supporting spatial knowledge of a complex virtual environment
Abstract External representations powerfully support and augment complex human behavior. When navigating, people often consult external representations to help them find the way to go, but do maps or verbal instructions improve spatial knowledge or support effective wayfinding? Here, we examine spatial knowledge with and without external representations in two studies where participants learn a complex virtual environment. In the first study, we asked participants to generate their own maps or verbal instructions, partway through learning. We found no evidence of improved spatial knowledge in a pointing task requiring participants to infer the direction between two targets, either on the same route or on different routes, and no differences between groups in accurately recreating a map of the target landmarks. However, as a methodological note, pointing was correlated with the accuracy of the maps that participants drew. In the second study, participants had access to an accurate map or set of verbal instructions that they could study while learning the layout of target landmarks. Again, we found no evidence of differentially improved spatial knowledge in the pointing task, although we did find that the map group could recreate a map of the target landmarks more accurately. However, overall improvement was high. There was evidence that the nature of improvement across all conditions was specific to initial navigation ability levels. Our findings add to a mixed literature on the role of external representations for navigation and suggest that more substantial intervention—more scaffolding, explicit training, enhanced visualization, perhaps with personalized sequencing—may be necessary to improve navigation ability.
more »
« less
- Award ID(s):
- 1956466
- PAR ID:
- 10434859
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Cognitive Research: Principles and Implications
- Volume:
- 8
- Issue:
- 1
- ISSN:
- 2365-7464
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Spatial perception of our hand is closely linked to our ability to move the hand accurately. We might therefore expect that reach planning would take into account any changes in perceived hand position; in other words, that perception and action relating to the hand should depend on a common sensorimotor map. However, there is evidence to suggest that changes in perceived hand position affect a body representation that functions separately from the body representation used to control movement. Here, we examined target-directed reaching before and after participants either did (Mismatch group) or did not (Veridical group) experience a cue conflict known to elicit recalibration in perceived hand position. For the reaching task, participants grasped a robotic manipulandum that positioned their unseen hand for each trial. Participants then briskly moved the handle straight ahead to a visual target, receiving no performance feedback. For the perceptual calibration task, participants estimated the locations of visual, proprioceptive, or combined cues about their unseen hand. The Mismatch group experienced a gradual 70-mm forward mismatch between visual and proprioceptive cues, resulting in forward proprioceptive recalibration. Participants made significantly shorter reaches after this manipulation, consistent with feeling their hand to be further forward than it was, but reaching performance returned to baseline levels after only 10 reaches. The Veridical group, after exposure to veridically aligned visual and proprioceptive cues about the hand, showed no change in reach distance. These results suggest that perceptual recalibration affects the same sensorimotor map that is used to plan target-directed reaches. NEW & NOTEWORTHY If perceived hand position changes, we might assume this affects the sensorimotor map and, in turn, reaches made with that hand. However, there is evidence for separate body representations involved in perception versus action. After a cross-sensory conflict that results in proprioceptive recalibration in the forward direction, participants made shorter reaches as predicted, but only briefly. This suggests perceptual recalibration does affect the sensorimotor map used to plan reaches, but the interaction may be short-lived.more » « less
-
Background: Drivers gather most of the information they need to drive by looking at the world around them and at visual displays within the vehicle. Navigation systems automate the way drivers navigate. In using these systems, drivers offload both tactical (route following) and strategic aspects (route planning) of navigational tasks to the automated SatNav system, freeing up cognitive and attentional resources that can be used in other tasks (Burnett, 2009). Despite the potential benefits and opportunities that navigation systems provide, their use can also be problematic. For example, research suggests that drivers using SatNav do not develop as much environmental spatial knowledge as drivers using paper maps (Waters & Winter, 2011; Parush, Ahuvia, & Erev, 2007). With recent growth and advances of augmented reality (AR) head-up displays (HUDs), there are new opportunities to display navigation information directly within a driver’s forward field of view, allowing them to gather information needed to navigate without looking away from the road. While the technology is promising, the nuances of interface design and its impacts on drivers must be further understood before AR can be widely and safely incorporated into vehicles. Specifically, an impact that warrants investigation is the role of AR HUDS in spatial knowledge acquisition while driving. Acquiring high levels of spatial knowledge is crucial for navigation tasks because individuals who have greater levels of spatial knowledge acquisition are more capable of navigating based on their own internal knowledge (Bolton, Burnett, & Large, 2015). Moreover, the ability to develop an accurate and comprehensive cognitive map acts as a social function in which individuals are able to navigate for others, provide verbal directions and sketch direction maps (Hill, 1987). Given these points, the relationship between spatial knowledge acquisition and novel technologies such as AR HUDs in driving is a relevant topic for investigation. Objectives: This work explored whether providing conformal AR navigational cues improves spatial knowledge acquisition (as compared to traditional HUD visual cues) to assess the plausibility and justification for investment in generating larger FOV AR HUDs with potentially multiple focal planes. Methods: This study employed a 2x2 between-subjects design in which twenty-four participants were counterbalanced by gender. We used a fixed base, medium fidelity driving simulator for where participants drove while navigating with one of two possible HUD interface designs: a world-relative arrow post sign and a screen-relative traditional arrow. During the 10-15 minute drive, participants drove the route and were encouraged to verbally share feedback as they proceeded. After the drive, participants completed a NASA-TLX questionnaire to record their perceived workload. We measured spatial knowledge at two levels: landmark and route knowledge. Landmark knowledge was assessed using an iconic recognition task, while route knowledge was assessed using a scene ordering task. After completion of the study, individuals signed a post-trial consent form and were compensated $10 for their time. Results: NASA-TLX performance subscale ratings revealed that participants felt that they performed better during the world-relative condition but at a higher rate of perceived workload. However, in terms of perceived workload, results suggest there is no significant difference between interface design conditions. Landmark knowledge results suggest that the mean number of remembered scenes among both conditions is statistically similar, indicating participants using both interface designs remembered the same proportion of on-route scenes. Deviance analysis show that only maneuver direction had an influence on landmark knowledge testing performance. Route knowledge results suggest that the proportion of scenes on-route which were correctly sequenced by participants is similar under both conditions. Finally, participants exhibited poorer performance in the route knowledge task as compared to landmark knowledge task (independent of HUD interface design). Conclusions: This study described a driving simulator study which evaluated the head-up provision of two types of AR navigation interface designs. The world-relative condition placed an artificial post sign at the corner of an approaching intersection containing a real landmark. The screen-relative condition displayed turn directions using a screen-fixed traditional arrow located directly ahead of the participant on the right or left side on the HUD. Overall results of this initial study provide evidence that the use of both screen-relative and world-relative AR head-up display interfaces have similar impact on spatial knowledge acquisition and perceived workload while driving. These results contrast a common perspective in the AR community that conformal, world-relative graphics are inherently more effective. This study instead suggests that simple, screen-fixed designs may indeed be effective in certain contexts.more » « less
-
Spatial navigation involves the use of various cues. This study examined how cue conflict influences navigation by contrasting landmarks and optic flow. Participants estimated spatial distances under different levels of cue conflict: minimal conflict, large conflict, and large conflict with explicit awareness of landmark instability. Whereas increased cue conflict alone had little behavioral impact, adding explicit awareness reduced reliance on landmarks and impaired the precision of spatial localization based on them. To understand the underlying mechanisms, we tested two cognitive models: a Bayesian causal inference (BCI) model and a non-Bayesian sensory disparity model. The BCI model provided a better fit to the data, revealing two independent mechanisms for reduced landmark reliance: increased sensory noise for unstable landmarks and lower weighting of unstable landmarks when landmarks and optic flow were judged to originate from different causes. Surprisingly, increased cue conflict did not decrease the prior belief in a common cause, even when explicit awareness of landmark instability was imposed. Additionally, cue weighting in the same-cause judgment was determined by bottom-up sensory reliability, while in the different-cause judgment, it correlated with participants’ subjective evaluation of cue quality, suggesting a top-down metacognitive influence. The BCI model further identified key factors contributing to suboptimal cue combination in minimal cue conflicts, including the prior belief in a common cause and prior knowledge of the target location. Together, these findings provide critical insights into how navigators resolve conflicting spatial cues and highlight the utility of the BCI model in dissecting cue interaction mechanisms in navigation.more » « less
-
null (Ed.)This paper presents a novel approach to robot task learning from language-based instructions, which focuses on increasing the complexity of task representations that can be taught through verbal instruction. The major proposed contribution is the development of a framework for directly mapping a complex verbal instruction to an executable task representation, from a single training experience. The method can handle the following types of complexities: 1) instructions that use conjunctions to convey complex execution constraints (such as alternative paths of execution, sequential or nonordering constraints, as well as hierarchical representations) and 2) instructions that use prepositions and multiple adjectives to specify action/object parameters relevant for the task. Specific algorithms have been developed for handling conjunctions, adjectives and prepositions as well as for translating the parsed instructions into parameterized executable task representations. The paper describes validation experiments with a PR2 humanoid robot learning new tasks from verbal instruction, as well as an additional range of utterances that can be parsed into executable controllers by the proposed system.more » « less