skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Synthesizing scene-aware virtual reality teleport graphs
We present a novel approach for synthesizing scene-aware virtual reality teleport graphs, which facilitate navigation in indoor virtual environments by suggesting desirable teleport positions. Our approach analyzes panoramic views at candidate teleport positions by extracting scene perception graphs, which encode scene perception relationships between the observer and the surrounding objects, and predict how desirable the views at these positions are. We train a graph convolutional model to predict the scene perception scores of different teleport positions. Based on such predictions, we apply an optimization approach to sample a set of desirable teleport positions while considering other navigation properties such as coverage and connectivity to synthesize a teleport graph. Using teleport graphs, users can navigate virtual environments efficaciously. We demonstrate our approach for synthesizing teleport graphs for common indoor scenes. By conducting a user study, we validate the efficacy and desirability of navigating virtual environments via the synthesized teleport graphs. We also extend our approach to cope with different constraints, user preferences, and practical scenarios.  more » « less
Award ID(s):
1942531
PAR ID:
10321074
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Transactions on Graphics
Volume:
40
Issue:
6
ISSN:
0730-0301
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Each view of our environment captures only a subset of our immersive surroundings. Yet, our visual experience feels seamless. A puzzle for human neuroscience is to determine what cognitive mechanisms enable us to overcome our limited field of view and efficiently anticipate new views as we sample our visual surroundings. Here, we tested whether memory-based predictions of upcoming scene views facilitate efficient perceptual judgments across head turns. We tested this hypothesis using immersive, head-mounted virtual reality (VR). After learning a set of immersive real-world environments, participants (n = 101 across 4 experiments) were briefly primed with a single view from a studied environment and then turned left or right to make a perceptual judgment about an adjacent scene view. We found that participants’ perceptual judgments were faster when they were primed with images from the same (vs. neutral or different) environments. Importantly, priming required memory: it only occurred in learned (vs. novel) environments, where the link between adjacent scene views was known. Further, consistent with a role in supporting active vision, priming only occurred in the direction of planned head turns and only benefited judgments for scene views presented in their learned spatiotopic positions. Taken together, we propose that memory-based predictions facilitate rapid perception across large-scale visual actions, such as head and body movements, and may be critical for efficient behavior in complex immersive environments. 
    more » « less
  2. Visual scenes are often remembered as if they were observed from a different viewpoint. Some scenes are remembered as farther than they appeared, and others as closer. These memory distortions—also known as boundary extension and contraction—are strikingly consistent for a given scene, but their cause remains unknown. We tested whether these distortions can be explained by an inferential process that adjusts scene memories toward high-probability views, using viewing depth as a test case. We first carried out a large-scale analysis of depth maps of natural indoor scenes to quantify the statistical probability of views in depth. We then assessed human observers’ memory for these scenes at various depths and found that viewpoint judgments were consistently biased toward the modal depth, even when just a few seconds elapsed between viewing and reporting. Thus, scenes closer than the modal depth showed a boundary-extension bias (remembered as farther-away), and scenes farther than the modal depth showed a boundary-contraction bias (remembered as closer). By contrast, scenes at the modal depth did not elicit a consistent bias in either direction. This same pattern of results was observed in a follow-up experiment using tightly controlled stimuli from virtual environments. Together, these findings show that scene memories are biased toward statistically probable views, which may serve to increase the accuracy of noisy or incomplete scene representations. 
    more » « less
  3. Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene, which adds computational overhead to the approach. This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. Using two transformer-based components, we first sample a possible scene graph structure from detected objects and their visual features. We then perform predicate classification on the sampled edges to generate the final scene graph. This approach allows us to efficiently generate scene graphs from images with minimal inference overhead. Extensive experiments on the Visual Genome dataset demonstrate the efficiency of the proposed approach. Without bells and whistles, we obtain, on average, 20.7% mean recall (mR@100) across different settings for scene graph generation (SGG), outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased SGG approaches. 
    more » « less
  4. Blind & visually impaired individuals often face challenges in wayfinding in unfamiliar environments. Thus, an accessible indoor positioning and navigation system that safely and accurately positions and guides such individuals would be welcome. In indoor positioning, both Bluetooth Low Energy (BLE) beacons and Google Tango have their individual strengths but also have weaknesses that can affect the overall usability of a system that solely relies on either component. We propose a hybrid positioning and navigation system that combines both BLE beacons and Google Tango in order to tap into their strengths while minimizing their individual weaknesses. In this paper, we will discuss the approach and implementation of a BLE- and Tango-based hybrid system. The results of pilot tests on the individual components and a human subject test on the full BLE and hybrid systems are also presented. In addition, we have explored the use of vibrotactile devices to provide additional information to a user about their surroundings. 
    more » « less
  5. Indoor navigation is necessary for users to explore large unfamiliar indoor environments such as airports, shopping malls, and hospital complex, which relies on the capability of continuously tracking a user's location. A typical indoor navigation system is built on top of a suitable Indoor Positioning System (IPS) and requires the user to periodically submit location queries to learn their whereabouts whereby to provide update-to-date navigation information. Received signal strength (RSS)-based IPSes are considered as one of the most classical IPSes, which locates a user by comparing the user's RSS measurement with the fingerprints collected at different locations in advance. Despite its significant advantages, existing RSS-IPSes suffer from two key challenges, the ambiguity of RSS fingerprints and device diversity, that may greatly reduce its positioning accuracy. In this paper, we introduce the design and evaluation of CITS, a novel RSS-based continuous indoor tracking system that can effectively cope with fingerprint ambiguity and device diversity via differential RSS fingerprint matching. Detailed experiment studies confirm the significant advantages of CITS over prior RSS-based solutions. 
    more » « less