skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 4, 2025

Title: ST-360: Spatial–Temporal Filtering-Based Low-Latency 360-Degree Video Analytics Framework
Recent advances in computer vision algorithms and video streaming technologies have facilitated the development of edge-server-based video analytics systems, enabling them to process sophisticated real-world tasks, such as traffic surveillance and workspace monitoring. Meanwhile, due to their omnidirectional recording capability, 360-degree cameras have been proposed to replace traditional cameras in video analytics systems to offer enhanced situational awareness. Yet, we found that providing an efficient 360-degree video analytics framework is a non-trivial task. Due to the higher resolution and geometric distortion in 360-degree videos, existing video analytics pipelines fail to meet the performance requirements for end-to-end latency and query accuracy. To address these challenges, we introduce the innovative ST-360 framework specifically designed for 360-degree video analytics. This framework features a spatial-temporal filtering algorithm that optimizes both data transmission and computational workloads. Evaluation of the ST-360 framework on a unique dataset of 360-degree first-responders videos reveals that it yields accurate query results with a 50% reduction in end-to-end latency compared to state-of-the-art methods.  more » « less
Award ID(s):
2140620
PAR ID:
10563356
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Multimedia Computing, Communications, and Applications
ISSN:
1551-6857
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, streamed 360° videos have gained popularity within Virtual Reality (VR) and Augmented Reality (AR) applications. However, they are of much higher resolutions than 2D videos, causing greater bandwidth consumption when streamed. This increased bandwidth utilization puts tremendous strain on the network capacity of the cloud providers streaming these videos. In this paper, we introduce L3BOU, a novel, three-tier distributed software framework that reduces cloud-edge bandwidth in the backhaul network and lowers average end-to-end latency for 360° video streaming applications. The L3BOU framework achieves low bandwidth and low latency by leveraging edge-based, optimized upscaling techniques. L3BOU accomplishes this by utilizing down-scaled MPEG-DASH-encoded 360° video data, known as Ultra Low Resolution (ULR) data, that the L3BOU edge applies distributed super-resolution (SR) techniques on, providing a high quality video to the client. L3BOU is able to reduce the cloud-edge backhaul bandwidth by up to a factor of 24, and the optimized super-resolution multi-processing of ULR data provides a 10-fold latency decrease in super resolution upscaling at the edge. 
    more » « less
  2. null (Ed.)
    Recent advances in virtual reality cameras have contributed to a phenomenal growth of 360∘∘ videos. Estimating regions likely to attract user attention is critical for efficiently streaming and rendering 360∘∘ videos. In this article, we present a simple, novel, GPU-driven pipeline for saliency computation and virtual cinematography in 360∘∘ videos using spherical harmonics (SH). We efficiently compute the 360∘∘ video saliency through the spectral residual of the SH coefficients between multiple bands at over 60FPS for 4K resolution videos. Further, our interactive computation of spherical saliency can be used for saliency-guided virtual cinematography in 360∘∘ videos. 
    more » « less
  3. Real-time video analytics typically require video frames to be processed by a query to identify objects or activities of interest while adhering to an end-to-end frame processing latency constraint. This imposes a continuous and heavy load on backend compute and network infrastructure. Video data has inherent redundancy and does not always contain an object of interest for a given query. We leverage this property of video streams to propose a lightweight Load Shedder that can be deployed on edge servers or on inexpensive edge devices co-located with cameras. The proposed Load Shedder uses pixel-level color-based features to calculate a utility score for each ingress video frame and a minimum utility threshold to select interesting frames to send for query processing. Dropping unnecessary frames enables the video analytics query in the backend to meet the end-to-end latency constraint with fewer compute and network resources. To guarantee a bounded end-to-end latency at runtime, we introduce a control loop that monitors the backend load and dynamically adjusts the utility threshold. Performance evaluations show that the proposed Load Shedder selects a large portion of frames containing each object of interest while meeting the end-to-end frame processing latency constraint. Furthermore, it does not impose a significant latency overhead when running on edge devices with modest compute resources. 
    more » « less
  4. 360-degree video is an emerging form of media that encodes information about all directions surrounding a camera, offering an immersive experience to the users. Unlike traditional 2D videos, visual information in 360-degree videos can be naturally represented as pixels on a sphere. Inspired by state-of-the-art deep-learning-based 2D image super-resolution models and spherical CNNs, in this paper, we design a novel spherical super-resolution (SSR) approach for 360-degree videos. To support viewport-adaptive and bandwidth-efficient transmission/streaming of 360-degree video data and save computation, we propose the Focused Icosahedral Mesh to represent a small area on the sphere. We further construct matrices to rotate spherical content over the entire sphere to the focused mesh area, allowing us to use the focused mesh to represent any area on the sphere. Motivated by the PixelShuffle operation for 2D super-resolution, we also propose a novel VertexShuffle operation on the mesh and an improved version VertexShuffle_V2. We compare our SSR approach with state-of-the-art 2D super-resolution models and show that SSR has the potential to achieve significant benefits when applied to spherical signals. 
    more » « less
  5. Predicting where users will look inside head-mounted displays (HMDs) and fetching only the relevant content is an effective approach for streaming bulky 360 videos over bandwidth-constrained networks. Despite previous efforts, anticipating users’ fast and sudden head movements is still difficult because there is a lack of clear understanding of the unique visual attention in 360 videos that dictates the users’ head movement in HMDs. This in turn reduces the effectiveness of streaming systems and degrades the users’ Quality of Experience. To address this issue, we propose to extract salient cues unique in the 360 video content to capture the attentive behavior of HMD users. Empowered by the newly discovered saliency features, we devise a head-movement prediction algorithm to accurately predict users’ head orientations in the near future. A 360 video streaming framework that takes full advantage of the head movement predictor is proposed to enhance the quality of delivered 360 videos. Practical trace-driven results show that the proposed saliency-based 360 video streaming system reduces the stall duration by 65% and the stall count by 46%, while saving 31% more bandwidth than state-of-the-art approaches. 
    more » « less