skip to main content


Title: VRVideos: A flexible pipeline for Virtual Reality Video Creation
Recent advances in Neural Radiance Field (NeRF)-based methods have enabled high-fidelity novel view synthesis for video with dynamic elements. However, these methods often require expensive hardware, take days to process a second-long video and do not scale well to longer videos. We create an end-to-end pipeline for creating dynamic 3D video from a monocular video that can be run on consumer hardware in minutes per second of footage, not days. Our pipeline handles the estimation of the camera parameters, depth maps, 3D reconstruction of dynamic foreground and static background elements, and the rendering of the 3D video on a computer or VR headset. We use a state-of-the-art visual transformer model to estimate depth maps which we use to scale COLMAP poses and enable RGB-D fusion with estimated depth data. In our preliminary experiments, we rendered the output in a VR headset and visually compared the method against ground-truth datasets and state-of-the-art NeRF-based methods.  more » « less
Award ID(s):
2144822
PAR ID:
10406020
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR)
Page Range / eLocation ID:
199 to 202
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF al- gorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) pro- cessing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adap- tively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF’s sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a high-density sparse search unit and a dual-purpose bi-direction adder & search tree to coordinate the two aforementioned encod- ing formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7×∼3,201×) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions. 
    more » « less
  2. Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors now available on modern smartphones. 
    more » « less
  3. Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones. 
    more » « less
  4. Wearable near-eye displays for virtual and augmented reality (VR/AR) have seen enormous growth in recent years. While researchers are exploiting a plethora of techniques to create life-like three-dimensional (3D) objects, there is a lack of awareness of the role of human perception in guiding the hardware development. An ultimate VR/AR headset must integrate the display, sensors, and processors in a compact enclosure that people can comfortably wear for a long time while allowing a superior immersion experience and user-friendly human–computer interaction. Compared with other 3D displays, the holographic display has unique advantages in providing natural depth cues and correcting eye aberrations. Therefore, it holds great promise to be the enabling technology for next-generation VR/AR devices. In this review, we survey the recent progress in holographic near-eye displays from the human-centric perspective.

     
    more » « less
  5. null (Ed.)
    3D point cloud completion has been a long-standing challenge at scale, and corresponding per-point supervised training strategies suffered from cumbersome annotations. 2D supervision has recently emerged as a promising alternative for 3D tasks, but specific approaches for 3D point cloud completion still remain to be explored. To overcome these limitations, we propose an end-to-end method that directly lifts a single depth map to a completed point cloud. With one depth map as input, a multi-way novel depth view synthesis network (NDVNet) is designed to infer coarsely completed depth maps under various viewpoints. Meanwhile, a geometric depth perspective rendering module is introduced to utilize the raw input depth map to generate a reprojected depth map for each view. Therefore, the two parallelly generated depth maps for each view are further concatenated and refined by a depth completion network (DCNet). The final completed point cloud is fused from all refined depth views. Experimental results demonstrate the effectiveness of our proposed approach composed of aforementioned components, to produce high-quality, state-of-the-art results on the popular SUNCG benchmark. 
    more » « less