Real-time object detection is essential for AI-based intelligent traffic management. However, growing complexities of deep learning models for object detection cause increased latency and resource requirements. To tackle the challenge, we introduce a new approach, named AROD (Adaptive Real-Time Object Detection), that infers the pixel motion speed in continuous traffic video frames and skips redundant frames when the pixel velocity is low. Thereby, AROD aims to significantly enhance the efficiency and scalability, sustaining the accuracy of object detection. Our evaluation using real-world traffic videos reveals that our method for pixel velocity inference via lightweight deep learning reduces the RMSE (Root Mean Square Error) by up to two orders of magnitude compared to state-of-the-art approaches. AROD improves the frame processing rate of YOLOv5, SSD, and EfficientDet by approximately 32-61%, 110-174%, and 120-213%, respectively. AROD considerably enhances scalability by supporting real-time object detection for up to three concurrent traffic video streams on a commodity machine. Moreover, AROD demonstrates its generalizability by supporting competitive accuracy in object detection for a separate traffic video that was fully hidden during training.
more »
« less
This content will become publicly available on October 7, 2025
AROD: Adaptive Real-Time Object Detection Based on Pixel Motion Speed
Real-time object detection is essential for AI-based intelligent traffic management. However, growing complexities of deep learning models for object detection cause increased latency and resource requirements. To tackle the challenge, we introduce a new approach, named AROD (Adaptive Real-Time Object Detection), that infers the pixel motion speed in continuous traffic video frames and skips redundant frames when the pixel velocity is low. Thereby, AROD aims to significantly enhance the efficiency and scalability, sustaining the accuracy of object detection. Our evaluation using real-world traffic videos reveals that our method for pixel velocity inference via lightweight deep learning reduces the RMSE (Root Mean Square Error) by up to two orders of magnitude compared to state-of-the-art approaches. AROD improves the frame processing rate of YOLOv5, SSD, and EfficientDet by approximately 32-61\%, 110-174\%, and 120-213\%, respectively. AROD considerably enhances scalability by supporting real-time object detection for up to three concurrent traffic video streams on a commodity machine. Moreover, AROD demonstrates its generalizability by supporting competitive accuracy in object detection for a separate traffic video that was fully hidden during training.
more »
« less
- PAR ID:
- 10547536
- Publisher / Repository:
- IEEE Vehicular Technology Conference
- Date Published:
- ISSN:
- 1556-6072
- Format(s):
- Medium: X
- Location:
- Washington, DC, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep learning models have significantly improved object detection, which is essential for visual sensing. However, their increasing complexity results in higher latency and resource consumption, making real-time object detection challenging. In order to address the challenge, we propose a new lightweight filtering method called L-filter to predict empty video frames that include no object of interest (e.g., vehicles) with high accuracy via hybrid time series analysis. L-filter drops those frames deemed empty and conducts object detection for nonempty frames only, significantly enhancing the frame processing rate and scalability of real-time object detection. Our evaluation demonstrates that L-filter improves the frame processing rate by 31–47% for a single traffic video stream compared to three standalone state-of-the-art object detection models without L-filter. Additionally, L-filter significantly enhances scalability; it can process up to six concurrent video streams in one commodity GPU, supporting over 57 fps per stream, by working alongside the fastest object detection model among the three models.more » « less
-
Traffic intersections are prime locations for deployment of infrastructure sensors and edge computing nodes to realize the vision of a smart city. It is expected that the needs of a smart city, in regards to traffic and pedestrian traffic systems monitored by cameras/video, can be met by using stateof-the-art artificial-intelligence (AI) based object detectors and trackers. A critical component in designing an effective real-time object detection/tracking pipeline is the understanding of how object density, i.e., the number of objects in a scene, and imageresolution and frame rate influence the performance metrics. This study explores the accuracy and speed metrics with the goal of supporting pipelines that meet the precision and latency needs of a real-time environment. We examine the impact of varying image-resolution, frame rate and object-density on the object detection performance metrics. The experiments on the COSMOS testbed dataset show that varying the frame width from 416 pixels to 832 pixels, and cropping the images to a square resolution, result in the increase in average precision for all object classes. Decreasing the frame rate from 15 fps to 5 fps preserves more than 90% of the highest F1 score achieved for all object classes. The results inform the choice of video preprocessing stages, modifications to established AI-based object detection/tracking methods, and suggest optimal hyper-parameter values. Index Terms—Object Detection, Smart City, Video Resolution, Deep Learning Models.more » « less
-
Deep learning models have significantly improved object detection essential for traffic monitoring. However, these models’ increasing complexity results in higher latency and resource consumption, making real-time object detection challenging. To address this issue, we propose a lightweight deep learning model called Empty Road Detection (ERD). ERD efficiently identifies and removes empty traffic images that do not contain any object of interest, such as vehicles, via binary classification. By serving as a preprocessing unit, ERD filters out nonessential data, reducing computational complexity and latency. ERD is highly compatible and can work seamlessly with any third-party object detection model. In our evaluation, we found that ERD improves the frame processing rate of EfficientDet, SSD, and YOLOV5 by approximately 44%, 40%, and 10%, respectively, for a real-world traffic monitoring video.more » « less
-
It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this “leap of faith” that deep learning models that work well on images will also work well on videos is actually flawed.We show that even when a video camera is viewing a scene that is not changing in any humanperceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually, but are perceived quite differently by the video analytics applications.We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an “unintentional adversary” because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. Our experiments with a number of different cameras, and a variety of different video analytics tasks, show that the inadvertent adversarial effect from the camera can be noticeably offset by quickly re-training the deep learning models using transfer learning. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects (∼40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera’s adversarial effect on deep learning models used for video analytics applications.more » « less