skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Toward Next-generation Volumetric Video Streaming with Neural-based Content Representations
Striking a balance between minimizing bandwidth consumption and maintaining high visual quality stands as the paramount objective in volumetric content delivery. However, achieving this ambitious target is a substantial challenge, especially for mobile devices with constrained computational resources, given the voluminous amount of 3D data to be streamed, strict latency requirements, and high computational load. Inspired by the advantages offered by neural radiance fields (NeRF), we propose, for the first time, to deliver volumetric videos by utilizing neural-based content representations. We delve deep into potential challenges and explore viable solutions for both video-on-demand (VOD) and live video streaming services, in terms of the end-to-end pipeline, real-time and high-quality streaming, rate adaptation, and viewport adaptation. Our preliminary results lend credence to the feasibility of our research proposition, offering a promising starting point for further investigation.  more » « less
Award ID(s):
2212296 2235049
PAR ID:
10467143
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400703393
Page Range / eLocation ID:
199 to 207
Format(s):
Medium: X
Location:
Madrid Spain
Sponsoring Org:
National Science Foundation
More Like this
  1. While recent work explored streaming volumetric content on-demand, there is little effort on live volumetric video streaming that bears the potential of bringing more exciting applications than its on-demand counterpart. To fill this critical gap, in this paper, we propose MetaStream, which is, to the best of our knowledge, the first practical live volumetric content capture, creation, delivery, and rendering system for immersive applications such as virtual, augmented, and mixed reality. To address the key challenge of the stringent latency requirement for processing and streaming a huge amount of 3D data, MetaStream integrates several innovations into a holistic system, including dynamic camera calibration, edge-assisted object segmentation, cross-camera redundant point removal, and foveated volumetric content rendering. We implement a prototype of MetaStream using commodity devices and extensively evaluate its performance. Our results demonstrate that MetaStream achieves low-latency live volumetric video streaming at close to 30 frames per second on WiFi networks. Compared to state-of-the-art systems, MetaStream reduces end-to-end latency by up to 31.7% while improving visual quality by up to 12.5%. 
    more » « less
  2. Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Comprehensive experimental results indicate that BONES increases QoE by 5% to 20% over state-of-the-art algorithms with minimal overhead. Our code is available at https://github.com/UMass-LIDS/bones. 
    more » « less
  3. Emerging multimedia applications often use a wireless LAN (Wi-Fi) infrastructure to stream content. These Wi-Fi deployments vary vastly in terms of their system configurations. In this paper, we take a step toward characterizing the Quality of Experience (QoE) of volumetric video streaming over an enterprise-grade Wi-Fi network to: (i) understand the impact of Wi-Fi control parameters on user QoE, (ii) analyze the relation between Quality of Service (QoS) metrics of Wi-Fi networks and application QoE, and (iii) compare the QoE of volumetric video streaming to traditional 2D video applications. We find that Wi-Fi configuration parameters such as channel width, radio interface, access category, and priority queues are important for optimizing Wi-Fi networks for streaming immersive videos. 
    more » « less
  4. This position paper explores the challenges and opportunities for high-quality immersive volumetric video streaming for multiple users over millimeter-wave (mmWave) WLANs. While most of the previous work has focused on single-user streaming, there is a growing need for multi-user immersive applications such as virtual collaboration, classroom education, teleconferencing, etc. While mmWave wireless links can provide multi-gigabit per second data rates, they suffer from blockages and high beamforming overhead. This paper investigates an environment-driven approach to address the challenges. It presents a comprehensive research agenda that includes developing a collaborative 3D scene reconstruction process, material identification, ray tracing, blockage mitigation, and cross-layer multi-user video rate adaptation. Our preliminary results show the feasibility and identify the limitations of existing solutions. Finally, we discuss the open challenges of implementing a practical system based on the proposed research agenda. 
    more » « less
  5. Super-resolution (SR) is a well-studied technique for reconstructing high-resolution (HR) images from low-resolution (LR) ones. SR holds great promise for video streaming since an LR video segment can be transmitted from the video server to the client that then reconstructs the HR version using SR, resulting in a significant reduction in network bandwidth. However, SR is seldom used in practice for real-time video streaming, because the computational overhead of frame reconstruction results in large latency and low frame rate. To reduce the computational overhead and make SR practical, we propose a deep-learning-based SR method called Fo veated Cas caded Video Super Resolution (focas). focas relies on the fact that human eyes only have high acuity in a tiny central foveal region of the retina. focas uses more neural network blocks in the foveal region to provide higher video quality, while using fewer blocks in the periphery as lower quality is sufficient. To optimize the computational resources and reduce reconstruction latency, focas formulates and solves a convex optimization problem to decide the number of neural network blocks to use in each region of the frame. Using extensive experiments, we show that focas reduces the latency by 50%-70% while maintaining comparable visual quality as traditional (non-foveated) SR. Further, focas provides a 12-16x reduction in the client-to-server network bandwidth in comparison with sending the full HR video segments. 
    more » « less