skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PanoSynthVR: Toward Light-weight 360-Degree View Synthesis from a Single Panoramic Input
We investigate how real-time, 360 degree view synthesis can be achieved on current virtual reality hardware from a single panoramic image input. We introduce a light-weight method to automatically convert a single panoramic input into a multi-cylinder image representation that supports real-time, free-viewpoint view synthesis rendering for virtual reality. We apply an existing convolutional neural network trained on pinhole images to a cylindrical panorama with wrap padding to ensure agreement between the left and right edges. The network outputs a stack of semi-transparent panoramas at varying depths which can be easily rendered and composited with over blending. Quantitative experiments and a user study show that the method produces convincing parallax and fewer artifacts than a textured mesh representation.  more » « less
Award ID(s):
1924008 2144822
PAR ID:
10384130
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Symposium on Mixed and Augmented Reality
ISSN:
2473-0726
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a method to automatically convert a single panoramic input into a multi-cylinder image representation that supports real-time, free-viewpoint view synthesis for virtual reality. We apply an existing convolutional neural network trained on pinhole images to a cylindrical panorama with wrap padding to ensure agreement between the left and right edges. The network outputs a stack of semi-transparent panoramas at varying depths which can be easily rendered and composited with over blending. Initial experiments show that the method produces convincing parallax and cleaner object boundaries than a textured mesh representation. 
    more » « less
  2. This work, situated at Rensselaer's Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVELab), uses panoramic image datasets for spatial audio display. A system is developed for the room-centered immersive virtual reality facility to analyze panoramic images on a segment-by-segment basis, using pre-trained neural network models for semantic segmentation and object detection, thereby generating audio objects with respective spatial locations. These audio objects are then mapped with a series of synthetic and recorded audio datasets and populated within a spatial audio environment as virtual sound sources. The resulting audiovisual outcomes are then displayed using the facility's human-scale panoramic display, as well as the 128-channel loudspeaker array for wave field synthesis (WFS). Performance evaluation indicates effectiveness for real-time enhancements, with potentials for large-scale expansion and rapid deployment in dynamic immersive virtual environments. 
    more » « less
  3. In recent years, novel view synthesis from a single image has seen significant progress thanks to the rapid advancements in 3D scene representation and image inpainting techniques. While the current approaches are able to synthesize geometrically consistent novel views, they often do not handle the view-dependent effects properly. Specifically, the highlights in their synthesized images usually appear to be glued to the surfaces, making the novel views unrealistic. To address this major problem, we make a key observation that the process of synthesizing novel views requires changing the shading of the pixels based on the novel camera, and moving them to appropriate locations. Therefore, we propose to split the view synthesis process into two independent tasks of pixel reshading and relocation. During the reshading process, we take the single image as the input and adjust its shading based on the novel camera. This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image. We propose to use a neural network to perform reshading and generate a large set of synthetic input-reshaded pairs to train our network. We demonstrate that our approach produces plausible novel view images with realistic moving highlights on a variety of real world scenes. 
    more » « less
  4. We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3d modeling, and autonomous robotic navigation. In contrast to previous approaches for applying convolutional neural networks to panoramic imagery, we use the cylindrical panoramic projection which allows for the use of the traditional CNN layers such as convolutional filters and max pooling without modification. Our evaluation of synthetic and real data shows that unsupervised learning of depth and ego-motion on cylindrical panoramic images can produce high-quality depth maps and that an increased field-of-view improves ego-motion estimation accuracy. We also introduce Headcam, a novel dataset of panoramic video collected from a helmet-mounted camera while biking in an urban setting. 
    more » « less
  5. Occlusion is a critical problem in the Autonomous Driving System. Solving this problem requires robust collaboration among autonomous vehicles traveling on the same roads. However, transferring the entirety of raw sensors' data among autonomous vehicles is expensive and can cause a delay in communication. This paper proposes a method called Realtime Collaborative Vehicular Communication based on Bird's-Eye-View (BEV) map. The BEV map holds the accurate depth information from the point cloud image while its 2D representation enables the method to use a novel and well-trained image-based backbone network. Most importantly, we encode the object detection results into the BEV representation to reduce the volume of data transmission and make real-time collaboration between autonomous vehicles possible. The output of this process, the BEV map, can also be used as direct input to most route planning modules. Numerical results show that this novel method can increase the accuracy of object detection by cross-verifying the results from multiple points of view. Thus, in the process, this new method also reduces the object detection challenges that stem from occlusion and partial occlusion. Additionally, different from many existing methods, this new method significantly reduces the data needed for transfer between vehicles, achieving a speed of 21.92 Hz for both the object detection process and the data transmission process, which is sufficiently fast for a real-time system. 
    more » « less