skip to main content


Title: PanoSynthVR: Toward Light-weight 360-Degree View Synthesis from a Single Panoramic Input
We investigate how real-time, 360 degree view synthesis can be achieved on current virtual reality hardware from a single panoramic image input. We introduce a light-weight method to automatically convert a single panoramic input into a multi-cylinder image representation that supports real-time, free-viewpoint view synthesis rendering for virtual reality. We apply an existing convolutional neural network trained on pinhole images to a cylindrical panorama with wrap padding to ensure agreement between the left and right edges. The network outputs a stack of semi-transparent panoramas at varying depths which can be easily rendered and composited with over blending. Quantitative experiments and a user study show that the method produces convincing parallax and fewer artifacts than a textured mesh representation.  more » « less
Award ID(s):
1924008 2144822
NSF-PAR ID:
10384130
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Symposium on Mixed and Augmented Reality
ISSN:
2473-0726
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a method to automatically convert a single panoramic input into a multi-cylinder image representation that supports real-time, free-viewpoint view synthesis for virtual reality. We apply an existing convolutional neural network trained on pinhole images to a cylindrical panorama with wrap padding to ensure agreement between the left and right edges. The network outputs a stack of semi-transparent panoramas at varying depths which can be easily rendered and composited with over blending. Initial experiments show that the method produces convincing parallax and cleaner object boundaries than a textured mesh representation. 
    more » « less
  2. This work, situated at Rensselaer's Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVELab), uses panoramic image datasets for spatial audio display. A system is developed for the room-centered immersive virtual reality facility to analyze panoramic images on a segment-by-segment basis, using pre-trained neural network models for semantic segmentation and object detection, thereby generating audio objects with respective spatial locations. These audio objects are then mapped with a series of synthetic and recorded audio datasets and populated within a spatial audio environment as virtual sound sources. The resulting audiovisual outcomes are then displayed using the facility's human-scale panoramic display, as well as the 128-channel loudspeaker array for wave field synthesis (WFS). Performance evaluation indicates effectiveness for real-time enhancements, with potentials for large-scale expansion and rapid deployment in dynamic immersive virtual environments. 
    more » « less
  3. Virtual Reality (VR), together with the network infrastructure, can provide an interactive and immersive experience for multiple users simultaneously and thus enables collaborative VR applications (e.g., VR-based classroom). However, the satisfactory user experience requires not only high-resolution panoramic image rendering but also extremely low latency and seamless user experience. Besides, the competition for limited network resources (e.g., multiple users share the total limited bandwidth) poses a significant challenge to collaborative user experience, in particular under the wireless network with time-varying capacities. While existing works have tackled some of these challenges, a principled design considering all those factors is still missing. In this paper, we formulate a combinatorial optimization problem to maximize the Quality of Experience (QoE), defined as the linear combination of the quality, the average VR content delivery delay, and variance of the quality over a finite time horizon. In particular, we incorporate the influence of imperfect motion prediction when considering the quality of the perceived contents. However, the optimal solution to this problem can not be implemented in real-time since it relies on future decisions. Then, we decompose the optimization problem into a series of combinatorial optimization in each time slot and develop a low-complexity algorithm that can achieve at least 1/2 of the optimal value. Despite this, the trace-based simulation results reveal that our algorithm performs very close to the optimal offline solution. Furthermore, we implement our proposed algorithm in a practical system with commercial mobile devices and demonstrate its superior performance over state-of-the-art algorithms. We open-source our implementations on https://github.com/SNeC-Lab-PSU/ICDCS-CollaborativeVR. 
    more » « less
  4. We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3d modeling, and autonomous robotic navigation. In contrast to previous approaches for applying convolutional neural networks to panoramic imagery, we use the cylindrical panoramic projection which allows for the use of the traditional CNN layers such as convolutional filters and max pooling without modification. Our evaluation of synthetic and real data shows that unsupervised learning of depth and ego-motion on cylindrical panoramic images can produce high-quality depth maps and that an increased field-of-view improves ego-motion estimation accuracy. We also introduce Headcam, a novel dataset of panoramic video collected from a helmet-mounted camera while biking in an urban setting. 
    more » « less
  5. Abstract

    We present DELIGHT, or Deep Learning Identification of Galaxy Hosts of Transients, a new algorithm designed to automatically and in real time identify the host galaxies of extragalactic transients. The proposed algorithm receives as input compact, multiresolution images centered at the position of a transient candidate and outputs two-dimensional offset vectors that connect the transient with the center of its predicted host. The multiresolution input consists of a set of images with the same number of pixels, but with progressively larger pixel sizes and fields of view. A sample of 16,791 galaxies visually identified by the Automatic Learning for the Rapid Classification of Events broker team was used to train a convolutional neural network regression model. We show that this method is able to correctly identify both relatively large (10″ <r< 60″) and small (r≤ 10″) apparent size host galaxies using much less information (32 kB) than with a large, single-resolution image (920 kB). The proposed method has fewer catastrophic errors in recovering the position and is more complete and has less contamination (<0.86%) recovering the crossmatched redshift than other state-of-the-art methods. The more efficient representation provided by multiresolution input images could allow for the identification of transient host galaxies in real time, if adopted in alert streams from new generation of large -etendue telescopes such as the Vera C. Rubin Observatory.

     
    more » « less