skip to main content


This content will become publicly available on October 6, 2024

Title: Toward Next-generation Volumetric Video Streaming with Neural-based Content Representations
Striking a balance between minimizing bandwidth consumption and maintaining high visual quality stands as the paramount objective in volumetric content delivery. However, achieving this ambitious target is a substantial challenge, especially for mobile devices with constrained computational resources, given the voluminous amount of 3D data to be streamed, strict latency requirements, and high computational load. Inspired by the advantages offered by neural radiance fields (NeRF), we propose, for the first time, to deliver volumetric videos by utilizing neural-based content representations. We delve deep into potential challenges and explore viable solutions for both video-on-demand (VOD) and live video streaming services, in terms of the end-to-end pipeline, real-time and high-quality streaming, rate adaptation, and viewport adaptation. Our preliminary results lend credence to the feasibility of our research proposition, offering a promising starting point for further investigation.  more » « less
Award ID(s):
2212296
NSF-PAR ID:
10467143
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Page Range / eLocation ID:
199 to 207
Format(s):
Medium: X
Location:
Madrid Spain
Sponsoring Org:
National Science Foundation
More Like this
  1. While recent work explored streaming volumetric content on-demand, there is little effort on live volumetric video streaming that bears the potential of bringing more exciting applications than its on-demand counterpart. To fill this critical gap, in this paper, we propose MetaStream, which is, to the best of our knowledge, the first practical live volumetric content capture, creation, delivery, and rendering system for immersive applications such as virtual, augmented, and mixed reality. To address the key challenge of the stringent latency requirement for processing and streaming a huge amount of 3D data, MetaStream integrates several innovations into a holistic system, including dynamic camera calibration, edge-assisted object segmentation, cross-camera redundant point removal, and foveated volumetric content rendering. We implement a prototype of MetaStream using commodity devices and extensively evaluate its performance. Our results demonstrate that MetaStream achieves low-latency live volumetric video streaming at close to 30 frames per second on WiFi networks. Compared to state-of-the-art systems, MetaStream reduces end-to-end latency by up to 31.7% while improving visual quality by up to 12.5%. 
    more » « less
  2. This position paper explores the challenges and opportunities for high-quality immersive volumetric video streaming for multiple users over millimeter-wave (mmWave) WLANs. While most of the previous work has focused on single-user streaming, there is a growing need for multi-user immersive applications such as virtual collaboration, classroom education, teleconferencing, etc. While mmWave wireless links can provide multi-gigabit per second data rates, they suffer from blockages and high beamforming overhead. This paper investigates an environment-driven approach to address the challenges. It presents a comprehensive research agenda that includes developing a collaborative 3D scene reconstruction process, material identification, ray tracing, blockage mitigation, and cross-layer multi-user video rate adaptation. Our preliminary results show the feasibility and identify the limitations of existing solutions. Finally, we discuss the open challenges of implementing a practical system based on the proposed research agenda. 
    more » « less
  3. Super-resolution (SR) is a well-studied technique for reconstructing high-resolution (HR) images from low-resolution (LR) ones. SR holds great promise for video streaming since an LR video segment can be transmitted from the video server to the client that then reconstructs the HR version using SR, resulting in a significant reduction in network bandwidth. However, SR is seldom used in practice for real-time video streaming, because the computational overhead of frame reconstruction results in large latency and low frame rate. To reduce the computational overhead and make SR practical, we propose a deep-learning-based SR method called Fo veated Cas caded Video Super Resolution (focas). focas relies on the fact that human eyes only have high acuity in a tiny central foveal region of the retina. focas uses more neural network blocks in the foveal region to provide higher video quality, while using fewer blocks in the periphery as lower quality is sufficient. To optimize the computational resources and reduce reconstruction latency, focas formulates and solves a convex optimization problem to decide the number of neural network blocks to use in each region of the frame. Using extensive experiments, we show that focas reduces the latency by 50%-70% while maintaining comparable visual quality as traditional (non-foveated) SR. Further, focas provides a 12-16x reduction in the client-to-server network bandwidth in comparison with sending the full HR video segments. 
    more » « less
  4. After the emergence of video streaming services, more creative and diverse multimedia content has become available, and now the capability of streaming 360-degree videos will open a new era of multimedia experiences. However, streaming these videos requires larger bandwidth and less latency than what is found in conventional video streaming systems. Rate adaptation of tiled videos and view prediction techniques are used to solve this problem. In this paper, we introduce the Navigation Graph, which models viewing behaviors in the temporal (segments) and the spatial (tiles) domains to perform the rate adaptation of tiled media associated with the view prediction. The Navigation Graph allows clients to perform view prediction more easily by sharing the viewing model in the same way in which media description information is shared in DASH. It is also useful for encoding the trajectory information in the media description file, which could also allow for more efficient navigation of 360-degree videos. This paper provides information about the creation of the Navigation Graph and its uses. The performance evaluation shows that the Navigation Graph based view prediction and rate adaptation outperform other existing tiled media streaming solutions. Navigation Graph is not limited to 360-degree video streaming applications, but it can also be applied to other tiled media streaming systems, such as volumetric media streaming for augmented reality applications. 
    more » « less
  5. null (Ed.)
    We investigate a novel communications system that integrates scalable multi-layer 360-degree video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360-degree video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360-degree tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360-degree viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions. 
    more » « less