skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: FOCAS: Practical Video Super Resolution using Foveated Rendering
Super-resolution (SR) is a well-studied technique for reconstructing high-resolution (HR) images from low-resolution (LR) ones. SR holds great promise for video streaming since an LR video segment can be transmitted from the video server to the client that then reconstructs the HR version using SR, resulting in a significant reduction in network bandwidth. However, SR is seldom used in practice for real-time video streaming, because the computational overhead of frame reconstruction results in large latency and low frame rate. To reduce the computational overhead and make SR practical, we propose a deep-learning-based SR method called Fo veated Cas caded Video Super Resolution (focas). focas relies on the fact that human eyes only have high acuity in a tiny central foveal region of the retina. focas uses more neural network blocks in the foveal region to provide higher video quality, while using fewer blocks in the periphery as lower quality is sufficient. To optimize the computational resources and reduce reconstruction latency, focas formulates and solves a convex optimization problem to decide the number of neural network blocks to use in each region of the frame. Using extensive experiments, we show that focas reduces the latency by 50%-70% while maintaining comparable visual quality as traditional (non-foveated) SR. Further, focas provides a 12-16x reduction in the client-to-server network bandwidth in comparison with sending the full HR video segments.  more » « less
Award ID(s):
1763617 1901137
PAR ID:
10346193
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 29th ACM International Conference on Multimedia (MM'21)
Page Range / eLocation ID:
5454 to 5462
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, streamed 360° videos have gained popularity within Virtual Reality (VR) and Augmented Reality (AR) applications. However, they are of much higher resolutions than 2D videos, causing greater bandwidth consumption when streamed. This increased bandwidth utilization puts tremendous strain on the network capacity of the cloud providers streaming these videos. In this paper, we introduce L3BOU, a novel, three-tier distributed software framework that reduces cloud-edge bandwidth in the backhaul network and lowers average end-to-end latency for 360° video streaming applications. The L3BOU framework achieves low bandwidth and low latency by leveraging edge-based, optimized upscaling techniques. L3BOU accomplishes this by utilizing down-scaled MPEG-DASH-encoded 360° video data, known as Ultra Low Resolution (ULR) data, that the L3BOU edge applies distributed super-resolution (SR) techniques on, providing a high quality video to the client. L3BOU is able to reduce the cloud-edge backhaul bandwidth by up to a factor of 24, and the optimized super-resolution multi-processing of ULR data provides a 10-fold latency decrease in super resolution upscaling at the edge. 
    more » « less
  2. Virtual reality systems today cannot yet stream immersive, retina-quality virtual reality video over a network. One of the greatest challenges to this goal is the sheer data rates required to transmit retina-quality video frames at high resolutions and frame rates. Recent work has leveraged the decay of visual acuity in human perception in novel gaze-contingent video compression techniques. In this paper, we show that reducing the motion-to-photon latency of a system itself is a key method for improving the compression ratio of gaze-contingent compression. Our key finding is that a client and streaming server system with sub-15ms latency can achieve 5x better compression than traditional techniques while also using simpler software algorithms than previous work. 
    more » « less
  3. Streaming of live 360-degree video allows users to follow a live event from any view point and has already been deployed on some commercial platforms. However, the current systems can only stream the video at relatively low-quality because the entire 360-degree video is delivered to the users under limited bandwidth. In this paper, we propose to use the idea of "flocking" to improve the performance of both prediction of field of view (FoV) and caching on the edge servers for live 360-degree video streaming. By assigning variable playback latencies to all the users in a streaming session, a "streaming flock" is formed and led by low latency users in the front of the flock. We propose a collaborative FoV prediction scheme where the actual FoV information of users in the front of the flock are utilized to predict of users behind them. We further propose a network condition aware flocking strategy to reduce the video freeze and increase the chance for collaborative FoV prediction on all users. Flocking also facilitates caching as video tiles downloaded by the front users can be cached by an edge server to serve the users at the back of the flock, thereby reducing the traffic in the core network. We propose a latency-FoV based caching strategy and investigate the potential gain of applying transcoding on the edge server. We conduct experiments using real-world user FoV traces and WiGig network bandwidth traces to evaluate the gains of the proposed strategies over benchmarks. Our experimental results demonstrate that the proposed streaming system can roughly double the effective video rate, which is the video rate inside a user's actual FoV, compared to the prediction only based on the user's own past FoV trajectory, while reducing video freeze. Furthermore, edge caching can reduce the traffic in the core network by about 80%, which can be increased to 90% with transcoding on edge server. 
    more » « less
  4. null (Ed.)
    We propose CoRE, a 360° video streaming approach that reduces bandwidth requirements compared to transferring the entire 360° video. CoRE uses non-linear sampling in both the spatial and temporal domains to achieve robustness to view direction prediction error and to transient wireless network bandwidth fluctuation. Each CoRE frame samples the environment in all directions, with full resolution over the predicted field of view and gradually decreasing resolution at the periphery, so that missing pixels are avoided, irrespective of the view prediction error magnitude. A CoRE video chunk has a main part at full frame rate, and an extension part at a gradually decreasing frame rate, which avoids stalls while waiting for a delayed transfer. We evaluate a prototype implementation of CoRE through trace-based experiments and a user study, and find that, compared to tiling with low-resolution padding, CoRE reduces data transfer amounts, stalls, and H.264 decoding overhead, increases frame rates, and eliminates missing pixels. 
    more » « less
  5. null (Ed.)
    Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth. 
    more » « less