skip to main content

This content will become publicly available on October 1, 2023

Title: Vues: Practical Mobile Volumetric Video Streaming Through Multiview Transcoding
The emerging volumetric videos offer a fully immersive, six degrees of freedom (6DoF) viewing experience, at the cost of extremely high bandwidth demand. In this paper, we design, implement, and evaluate Vues, an edge-assisted transcoding system that delivers high-quality volumetric videos with low bandwidth requirement, low decoding overhead, and high quality of experience (QoE) on mobile devices. Through an IRB-approved user study, we build a f irst-of-its-kind QoE model to quantify the impact of various factors introduced by transcoding volumetric content into 2D videos. Motivated by the key observations from this user study, Vues employs a novel multiview approach with the overarching goal of boosting QoE. The Vues edge server adaptively transcodes a volumetric video frame into multiple 2D views with the help of a few lightweight machine learning models and strategically balances the extra bandwidth consumption of additional views and the improved QoE, indicated by our QoE model. The client selects the view that optimizes the QoE among the delivered candidates for display. Comprehensive evaluations using a prototype implementation indicate that Vues dramatically outperforms existing approaches. On average, it improves the QoE by 35% (up to 85%), compared to single-view transcoding schemes, and reduces the bandwidth consumption by 95%, more » compared to the state-of-the-art that directly streams volumetric videos. « less
; ; ; ;
Award ID(s):
2106090 1901103 2106771 2128489
Publication Date:
Journal Name:
ACM MobiCom 2022
Sponsoring Org:
National Science Foundation
More Like this
  1. Spatiotemporal variation in cellular bandwidth availability is well-known and could affect a mobile user's quality of experience (QoE), especially while using bandwidth intensive streaming applications such as movies, podcasts, and music videos during commute. If such variations are made available to a streaming service in advance it could perhaps plan better to avoid sub-optimal performance while the user travels through regions of low bandwidth availability. The intuition is that such future knowledge could be used to buffer additional content in regions of higher bandwidth availability to tide over the deficits in regions of low bandwidth availability. Foresight is a service designed to provide this future knowledge for client apps running on a mobile device. It comprises three components: (a) a crowd-sourced bandwidth estimate reporting facility, (b) an on-cloud bandwidth service that records the spatiotemporal variations in bandwidth and serves queries for bandwidth availability from mobile users, and (c) an on-device bandwidth manager that caters to the bandwidth requirements from client apps by providing them with bandwidth allocation schedules. Foresight is implemented in the Android framework. As a proof of concept for using this service, we have modified an open-source video player---Exoplayer---to use the results of Foresight in its video buffermore »management. Our performance evaluation shows Foresight's scalability. We also showcase the opportunity that Foresight offers to ExoPlayer to enhance video quality of experience (QoE) despite spatiotemporal bandwidth variations for metrics such as overall higher bitrate of playback, reduction in number of bitrate switches, and reduction in the number of stalls during video playback.« less
  2. Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth.
  3. In recent years, streamed 360° videos have gained popularity within Virtual Reality (VR) and Augmented Reality (AR) applications. However, they are of much higher resolutions than 2D videos, causing greater bandwidth consumption when streamed. This increased bandwidth utilization puts tremendous strain on the network capacity of the cloud providers streaming these videos. In this paper, we introduce L3BOU, a novel, three-tier distributed software framework that reduces cloud-edge bandwidth in the backhaul network and lowers average end-to-end latency for 360° video streaming applications. The L3BOU framework achieves low bandwidth and low latency by leveraging edge-based, optimized upscaling techniques. L3BOU accomplishes this by utilizing down-scaled MPEG-DASH-encoded 360° video data, known as Ultra Low Resolution (ULR) data, that the L3BOU edge applies distributed super-resolution (SR) techniques on, providing a high quality video to the client. L3BOU is able to reduce the cloud-edge backhaul bandwidth by up to a factor of 24, and the optimized super-resolution multi-processing of ULR data provides a 10-fold latency decrease in super resolution upscaling at the edge.
  4. Short videos have recently emerged as a popular form of short- duration User Generated Content (UGC) within modern social me- dia. Short video content is generally less than a minute long and predominantly produced in vertical orientation on smartphones. While still fundamentally being streaming, short video delivery is distinctly characterized by the deployment of a mechanism that pre-loads ahead of user request. Background pre-loading aims to eliminate start-up time, which is now prioritized higher in Quality of Experience (QoE) objectives, given that the application design facilitates instant ‘swiping’ to the next video in a recommended sequence. In this work, we provide a comprehensive comparison of four popular short video services. In particular, we explore content characteristics and evaluate the video quality across resolutions for each service. We next characterize the pre-loading policy adopted by each service. Last, we conduct an experimental study to investi- gate data consumption and evaluate achieved QoE under different network scenarios and application configurations.
  5. Virtual Reality (VR), together with the network infrastructure, can provide an interactive and immersive experience for multiple users simultaneously and thus enables collaborative VR applications (e.g., VR-based classroom). However, the satisfactory user experience requires not only high-resolution panoramic image rendering but also extremely low latency and seamless user experience. Besides, the competition for limited network resources (e.g., multiple users share the total limited bandwidth) poses a significant challenge to collaborative user experience, in particular under the wireless network with time-varying capacities. While existing works have tackled some of these challenges, a principled design considering all those factors is still missing. In this paper, we formulate a combinatorial optimization problem to maximize the Quality of Experience (QoE), defined as the linear combination of the quality, the average VR content delivery delay, and variance of the quality over a finite time horizon. In particular, we incorporate the influence of imperfect motion prediction when considering the quality of the perceived contents. However, the optimal solution to this problem can not be implemented in real-time since it relies on future decisions. Then, we decompose the optimization problem into a series of combinatorial optimization in each time slot and develop a low-complexity algorithm that canmore »achieve at least 1/2 of the optimal value. Despite this, the trace-based simulation results reveal that our algorithm performs very close to the optimal offline solution. Furthermore, we implement our proposed algorithm in a practical system with commercial mobile devices and demonstrate its superior performance over state-of-the-art algorithms. We open-source our implementations on« less