skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 16, 2026

Title: ACT360: An Efficient 360-Degree Action Detection and Summarization Framework for Mission-Critical Training and Debriefing
Effective training and debriefing are critical in high-stakes, mission-critical environments such as firefighting, where precision and error minimization are paramount. The traditional post-training analysis relies on the manual review of 2D video, a process that is time-consuming and lacks comprehensive situational awareness. To address these limitations, we introduce ACT360, a novel system that leverages 360-degree video and machine learning for automated action detection and efficient debriefing. ACT360 incorporates 360YOWO, a customized You Only Watch Once (YOWO) model enhanced with a spatial attention mechanism and equirectangular-aware convolution (EAC) to handle the unique distortions of panoramic video data. To enable deployment in resource-constrained environments, we apply quantization and model pruning, reducing the model size by 74% while maintaining robust accuracy (mAP drop of only 1.5 %, from 0.865 to 0.850) and improving inference speed. We validate our approach on a new, publicly available dataset of 55 labeled 360-degree videos covering seven key firefighting actions, recorded across various real-world practice sessions and environmental conditions. Furthermore, we integrate the pipeline with 360AIE (Action Insight Explorer), a web-based interface that provides automatic action detection, retrieval, and textual summarization of key events using large language models (LLMs), significantly improving post-incident analysis efficiency. ACT360 serves as a generalized framework for mission-critical debriefing, incorporating techniques such as EAC, spatial attention, summarization, and model optimization. These innovations apply to any training environment requiring lightweight action detection and structured nost-exercise analysis.  more » « less
Award ID(s):
2140645 2106592 1900875
PAR ID:
10635552
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3315-8646-1
Page Range / eLocation ID:
34 to 41
Subject(s) / Keyword(s):
Action Detection Distortion Optimal Model Attention Mechanism Model Size Spatial Attention Manual Review Training Environment Spatial Attention Mechanism 2D Video 360-degree Video Latitude Emergency Response Convolutional Layers Nighttime Exercise Training Object Detection Convolution Operation Action Recognition 2D Model Relevant Regions Inference Time Mission-critical Applications Video Understanding Google Cloud Platform Temporal Model Manual Selection Accuracy Of Feature Extraction
Format(s):
Medium: X
Location:
Cork, Ireland
Sponsoring Org:
National Science Foundation
More Like this
  1. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    Multimodal Learning Analytics (MMLA) has emerged as a powerful approach within the computer-supported collaborative learning community, offering nuanced insights into learning processes through diverse data sources. Despite its potential, the prevalent reliance on traditional instruments such as tripod-mounted digital cameras for video capture often results in sub optimal data quality for facial expressions captured, which is crucial for understanding collaborative dynamics. This study introduces an innovative approach to overcome this limitation by employing 360-degree camera technology to capture students' facial features while collaborating in small working groups. A comparative analysis of 1.5 hours of video data from both traditional tripod-mounted digital cameras and 360-degree cameras evaluated the efficacy of these methods in capturing Facial Action Units (AU) and facial keypoints. The use of OpenFace revealed that the 360-degree camera captured high-quality facial features in 33.17\% of frames, significantly outperforming the traditional method's 8.34\%, thereby enhancing reliability in facial feature detection. The findings suggest a pathway for future research to integrate 360-degree camera technology in MMLA. Future research directions involve refining this technology further to improve the detection of affective states in collaborative learning environments, thereby offering a richer understanding of the learning process. 
    more » « less
  2. null (Ed.)
    We investigate a novel communications system that integrates scalable multi-layer 360-degree video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360-degree video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360-degree tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360-degree viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions. 
    more » « less
  3. Recent advances in computer vision algorithms and video streaming technologies have facilitated the development of edge-server-based video analytics systems, enabling them to process sophisticated real-world tasks, such as traffic surveillance and workspace monitoring. Meanwhile, due to their omnidirectional recording capability, 360-degree cameras have been proposed to replace traditional cameras in video analytics systems to offer enhanced situational awareness. Yet, we found that providing an efficient 360-degree video analytics framework is a non-trivial task. Due to the higher resolution and geometric distortion in 360-degree videos, existing video analytics pipelines fail to meet the performance requirements for end-to-end latency and query accuracy. To address these challenges, we introduce the innovative ST-360 framework specifically designed for 360-degree video analytics. This framework features a spatial-temporal filtering algorithm that optimizes both data transmission and computational workloads. Evaluation of the ST-360 framework on a unique dataset of 360-degree first-responders videos reveals that it yields accurate query results with a 50% reduction in end-to-end latency compared to state-of-the-art methods. 
    more » « less
  4. null (Ed.)
    Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth. 
    more » « less
  5. Video summarization aims to simplify large-scale video browsing by generating con- cise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations, recent progress for video summarization concentrates on unsupervised methods, among which the GAN-based methods are most prevalent. This type of methods includes a summarizer and a discriminator. The summarized video from the summarizer will be assumed as the final output, only if the video reconstructed from this summary cannot be discriminated from the original one by the discriminator. The primary problems of this GAN-based methods are two-folds. First, the summarized video in this way is a subset of original video with low redundancy and contains high priority events/entities. This summarization criterion is not enough. Second, the training of the GAN framework is not stable. This paper proposes a novel Entity–relationship Aware video summarization method (ERA) to address the above problems. To be more spe- cific, we introduce an Adversarial Spatio-Temporal network to construct the relationship among entities, which we think should also be given high priority in the summarization. The GAN training problem is solved by introducing the Wasserstein GAN and two newly proposed video-patch/score-sum losses. In addition, the score-sum loss can also relieve the model sensitivity to the varying video lengths, which is an inherent problem for most current video analysis tasks. Our method substantially lifts the performance on the target benchmark datasets and exceeds the current state-of-the-art. We hope our straightfor- ward yet effective approach will shed some light on the future research of unsupervised video summarization. The code is available online. 
    more » « less