skip to main content

This content will become publicly available on March 1, 2023

Title: InDepth: Real-time Depth Inpainting for Mobile Augmented Reality
Mobile Augmented Reality (AR) demands realistic rendering of virtual content that seamlessly blends into the physical environment. For this reason, AR headsets and recent smartphones are increasingly equipped with Time-of-Flight (ToF) cameras to acquire depth maps of a scene in real-time. ToF cameras are cheap and fast, however, they suffer from several issues that affect the quality of depth data, ultimately hampering their use for mobile AR. Among them, scale errors of virtual objects - appearing much bigger or smaller than what they should be - are particularly noticeable and unpleasant. This article specifically addresses these challenges by proposing InDepth, a real-time depth inpainting system based on edge computing. InDepth employs a novel deep neural network (DNN) architecture to improve the accuracy of depth maps obtained from ToF cameras. The DNN fills holes and corrects artifacts in the depth maps with high accuracy and eight times lower inference time than the state of the art. An extensive performance evaluation in real settings shows that InDepth reduces the mean absolute error by a factor of four with respect to ARCore DepthLab. Finally, a user study reveals that InDepth is effective in rendering correctly-scaled virtual objects, outperforming DepthLab.
Authors:
; ; ; ; ;
Award ID(s):
2046072 1908051 1903136
Publication Date:
NSF-PAR ID:
10320717
Journal Name:
Proceedings of the ACM on interactive mobile wearable and ubiquitous technologies
Volume:
6
Issue:
1
ISSN:
2474-9567
Sponsoring Org:
National Science Foundation
More Like this
  1. Though virtual reality (VR) has been advanced to certain levels of maturity in recent years, the general public, especially the population of the blind and visually impaired (BVI), still cannot enjoy the benefit provided by VR. Current VR accessibility applications have been developed either on expensive head-mounted displays or with extra accessories and mechanisms, which are either not accessible or inconvenient for BVI individuals. In this paper, we present a mobile VR app that enables BVI users to access a virtual environment on an iPhone in order to build their skills of perception and recognition of the virtual environment andmore »the virtual objects in the environment. The app uses the iPhone on a selfie stick to simulate a long cane in VR, and applies Augmented Reality (AR) techniques to track the iPhone’s real-time poses in an empty space of the real world, which is then synchronized to the long cane in the VR environment. Due to the use of mixed reality (the integration of VR & AR), we call it the Mixed Reality cane (MR Cane), which provides BVI users auditory and vibrotactile feedback whenever the virtual cane comes in contact with objects in VR. Thus, the MR Cane allows BVI individuals to interact with the virtual objects and identify approximate sizes and locations of the objects in the virtual environment. We performed preliminary user studies with blind-folded participants to investigate the effectiveness of the proposed mobile approach and the results indicate that the proposed MR Cane could be effective to help BVI individuals in understanding the interaction with virtual objects and exploring 3D virtual environments. The MR Cane concept can be extended to new applications of navigation, training and entertainment for BVI individuals without more significant efforts.« less
  2. Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with lowmore »reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones.« less
  3. Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with lowmore »reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors now available on modern smartphones.« less
  4. Objective Evaluate and model the advantage of a situation awareness (SA) supported by an augmented reality (AR) display for the ground-based joint terminal attack Controller (JTAC), in judging and describing the spatial relations between objects in a hostile zone. Background The accurate world-referenced description of relative locations of surface objects, when viewed from an oblique slant angle (aircraft, observation post) is hindered by (1) the compression of the visual scene, amplified at a lower slang angle, (2) the need for mental rotation, when viewed from a non-northerly orientation. Approach Participants viewed a virtual reality (VR)-simulated four-object scene from either ofmore »two slant angles, at each of four compass orientations, either unaided, or aided by an AR head-mounted display (AR-HMD), depicting the scene from a top-down (avoiding compression) and north-up (avoiding mental rotation) perspective. They described the geographical layout of four objects within the display. Results Compared with the control condition, that condition supported by the north-up SA display shortened the description time, particularly on non-northerly orientations (9 s, 30% benefit), and improved the accuracy of description, particularly for the more compressed scene (lower slant angle), as fit by a simple computational model. Conclusion The SA display provides large, significant benefits to this critical phase of ground-air communications in managing an attack—as predicted by the task analysis of the JTAC. Application Results impact the design of the AR-HMD to support combat ground-air communications and illustrate the magnitude by which basic cognitive principles “scale up” to realistically simulated real-world tasks such as search and rescue.« less
  5. For optical see-through augmented reality (AR), a new method for measuring the perceived three-dimensional location of virtual objects is presented, where participants verbally report a virtual object’s location relative to both a vertical and horizontal grid. The method is tested with a small (1.95 × 1.95 × 1.95 cm) virtual object at distances of 50 to 80 cm, viewed through a Microsoft HoloLens 1 st generation AR display. Two experiments examine two different virtual object designs, whether turning in a circle between reported object locations disrupts HoloLens tracking, and whether accuracy errors, including a rightward bias and underestimated depth, mightmore »be due to systematic errors that are restricted to a particular display. Turning in a circle did not disrupt HoloLens tracking, and testing with a second display did not suggest systematic errors restricted to a particular display. Instead, the experiments are consistent with the hypothesis that, when looking downwards at a horizontal plane, HoloLens 1 st generation displays exhibit a systematic rightward perceptual bias. Precision analysis suggests that the method could measure the perceived location of a virtual object within an accuracy of less than 1 mm.« less