Super-resolution (SR) is a well-studied technique for reconstructing high-resolution (HR) images from low-resolution (LR) ones. SR holds great promise for video streaming since an LR video segment can be transmitted from the video server to the client that then reconstructs the HR version using SR, resulting in a significant reduction in network bandwidth. However, SR is seldom used in practice for real-time video streaming, because the computational overhead of frame reconstruction results in large latency and low frame rate. To reduce the computational overhead and make SR practical, we propose a deep-learning-based SR method called Fo veated Cas caded Video Super Resolution (focas). focas relies on the fact that human eyes only have high acuity in a tiny central foveal region of the retina. focas uses more neural network blocks in the foveal region to provide higher video quality, while using fewer blocks in the periphery as lower quality is sufficient. To optimize the computational resources and reduce reconstruction latency, focas formulates and solves a convex optimization problem to decide the number of neural network blocks to use in each region of the frame. Using extensive experiments, we show that focas reduces the latency by 50%-70% while maintaining comparable visual quality as traditional (non-foveated) SR. Further, focas provides a 12-16x reduction in the client-to-server network bandwidth in comparison with sending the full HR video segments.
more »
« less
This content will become publicly available on March 31, 2026
Accelerating Video Segment Access via Quality-Aware Multi-Source Selection
Video data can be slow to process due to the size of video streams and the computational complexity needed to decode, transform, and encode them. These challenges are particularly significant in interactive applications, such as quickly generating compilation videos from a user search. We look at optimizing access to source video segments in multimedia systems where multiple separately encoded copies of video sources are available, such as proxy/optimized media in conventional non-linear video editors or VOD streams in content distribution networks. Rather than selecting a single source to use (e.g., "use the lowest-bitrate 720p source"), we specify a minimum visual quality (e.g., "use any frames with VMAF ≥ 85"). This quality constraint and the needed segment bounds are used to find the lowest-latency operations to decode a segment from multiple available sources with diverse bitrates, resolutions, and codecs. This uses higher-quality/slower-to-decode sources if the encoding is better aligned for the specific segment bounds, which can provide faster access than using just one lower-quality source. We provide a general solution to this Quality-Aware Multi-Source Selection problem with optimal computational complexity. We create a dataset using adaptive-bitrate streaming Video on Demand sources from YouTube's CDN. We evaluate our algorithm on simple segment decoding as well as embedded into a larger editing system---a declarative video editor. Our evaluation shows up to 23% lower latency access, depending on segment length, at identical visual quality levels.
more »
« less
- Award ID(s):
- 2118240
- PAR ID:
- 10611551
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400714672
- Page Range / eLocation ID:
- 181 to 189
- Subject(s) / Keyword(s):
- multimedia databases declarative video editing imageomics
- Format(s):
- Medium: X
- Location:
- Stellenbosch South Africa
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Real-time interactive video streaming applications like cloud-based video games, AR, and VR require high quality video streams and extremely low end-to-end interaction delays. These requirements cause the QoE to be extremely sensitive to packet losses. Due to the inter-dependency between compressed frames, packet losses stall the video decode pipeline until the lost packets are retransmitted (resulting in stutters and higher delays), or the decoder state is reset using IDR-frames (lower video quality for given bandwidth). Prism is a hybrid predictive-reactive packet loss recovery scheme that uses a split-stream video coding technique to meet the needs of ultra-low latency video streaming applications. Prism's approach enables aggressive loss prediction, rapid loss recovery, and high video quality post-recovery, with zero overhead during normal operation - avoiding the pitfalls of existing approaches. Our evaluation on real video game footage shows that Prism reduces the penalty of using I-frames for recovery by 81%, while achieving 30% lower delay than pure retransmission-based recovery.more » « less
-
We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.more » « less
-
null (Ed.)As video tra!c continues to dominate the Internet, interest in nearsecond low-latency streaming has increased. Existing low-latency streaming platforms rely on using tens of seconds of video in the bu"er to o"er a seamless experience. Striving for near-second latency requires the receiver to make quick decisions regarding the download bitrate and the playback speed. To cope with the challenges, we design a new adaptive bitrate (ABR) scheme, Stallion, for STAndard Low-LAtency vIdeo cONtrol. Stallion uses a sliding window to measure the mean and standard deviation of both the bandwidth and latency. We evaluate Stallion and compare it to the standard DASH DYNAMIC algorithm over a variety of networking conditions. Stallion shows 1.8x increase in bitrate, and 4.3x reduction in the number of stalls.more » « less
-
Low-latency is a critical user Quality-of-Experience (QoE) metric for live video streaming. It poses significant challenges for streaming over the Internet. In this paper, we explore the design space of low-latency live video streaming by developing dynamic models and optimal control strategies. We further develop practical live video streaming algorithms within the Model Predictive Control (MPC) framework, namely MPC-Live, to maximize user QoE by adapting the video bitrate while maintaining low end-to-end video latency in dynamic network environment. Through extensive experiments driven by real network traces, we demonstrate that our live video streaming algorithms can improve the performance dramatically within latency range of two to five seconds.more » « less
An official website of the United States government
