<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Real-Time Physical Threat Detection on Edge Data Using Online Learning</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>03/14/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10447286</idno>
					<idno type="doi">10.1109/MCE.2023.3256641</idno>
					<title level='j'>IEEE Consumer Electronics Magazine</title>
<idno>2162-2248</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Utsab Khakurel</author><author>Danda B. Rawat</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Sensor-powered devices offer safe global connections; cloud scalability and flexibility, and new business value driven by data. The constraints that have historically obstructed major innovations in technology can be addressed by advancements in Artificial Intelligence (AI) and Machine Learning (ML), cloud, quantum computing, and the ubiquitous availability of data. Edge AI (Edge Artificial Intelligence) refers to the deployment of AI applications on the edge device near the data source rather than in a cloud computing environment. Although edge data has been utilized to make inferences in real-time through predictive models, real-time machine learning has not yet been fully adopted. Real-time machine learning utilizes real-time data to learn on the go, which helps in faster and more accurate real-time predictions and eliminates the need to store data eradicating privacy issues. In this article, we present the practical prospect of developing a physical threat detection system using real-time edge data from security cameras/sensors to improve the accuracy, efficiency, reliability, security, and privacy of the real-time inference model.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>connected devices in use in 2021 were Internet of Things (IoT) devices. According to <ref type="bibr">[1]</ref>, there will be 30 billion (or 75%) active IoT devices by 2025, or an average of 4 devices per person. IoT and sensor devices are generating data exponentially, which can be used to predict trends. Edge computing is the architecture in which the processing and storing of data happens close to the data source. Real-time processing at the edge is more efficient, accurate, and necessary as the majority of the data is created there. The three components of the edge computing architecture are the cloud layer, the boundary layer, and the terminal layer <ref type="bibr">[2]</ref>.</p><p>Edge AI deploys AI applications on edge devices near the data source, enabling deployment of AI in resource-constrained environments with real-time insights, reduced latency, cost, and increased privacy <ref type="bibr">[10]</ref>. YOLO (You Only Look Once), a state-of-theart object detection system, provides high accuracy and real-time object detection at up to 45 fps, making it useful for Edge AI applications in autonomous vehicles and surveillance systems <ref type="bibr">[4]</ref>. Edge AI can be used to process real-time data from IoT sensors to detect potential security systems threats. YOLO object detection algorithms can identify physical threats on the edge and take immediate action. Few physical security systems in the market can perform real-time detection using visual, night vision, and thermal video frames <ref type="bibr">[9]</ref>. Current real-time AI detection systems require batch-trained models, which are costly and time-consuming to update, and deploying the detection model on the cloud causes data transfer latency. Deploying AI at the edge with real-time learning can improve accuracy, efficiency, security, and privacy in security systems, while addressing dynamic and evolving threats.</p><p>In this article, we propose a real-time physical threat detection framework that combines edge-based object detection AI with online learning to improve accuracy, efficiency, reliability, security, and privacy. Edge deployment leads to a fast and efficient system with minimal resources required. Online learning allows the model to learn from new data in realtime, with fast and affordable learning steps <ref type="bibr">[3]</ref>. The proposed edge-based model ensures privacy protection by immediately utilizing and discarding data, while also being capable of offline operation, making it dependable and trustworthy. Additionally, the model's ability to operate through visible and thermal images further enhances its reliability. Our solution enhances physical security systems through the combination of edge computing and incremental online learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BACKGROUND AND RELATED WORK</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Overview of Object Detection (OD)</head><p>Object Detection (OD) is a computer vision technique for detecting, locating, and classifying objects in images. Two main types of OD algorithms are two-shot detection and single-shot detection. YOLO <ref type="bibr">[4]</ref> takes an image and splits it into S x S grid, where each grid will have parameters defining an object as [P c , b x , b y , b w , b h , c], where, P c denotes the confidence score for the object in the box, (b x , b y ) represents the center of the box relative to the grid cell, (b w , b h ) represents the width and height relative to the whole image, and c denotes the presence of each class in the cell. YOLO is fast, accurate, and the best option for real-time object detection, utilizing anchor boxes and non-maximal suppression to improve performance <ref type="bibr">[4]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Overview of Online Learning (OL)</head><p>ML has flourished with the availability of data, storage, and processors, but data generation surpasses its usage rate. Online learning trains models in realtime by incrementally learning from continuous data input, eliminating the need for data storage <ref type="bibr">[3]</ref>. However, faulty data could negatively impact performance, so proper data governance is required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related Work and Objectives</head><p>In airport security, deep learning approaches have been utilized to detect potential threats in X-ray and Thermal Infrared (TIR) images <ref type="bibr">[5]</ref>. The effectiveness of different versions of YOLO models is evaluated using Teledyne Forward-looking Infrared (FLIR) Thermal Dataset in <ref type="bibr">[6]</ref>. Another study utilized YOLOv3 and demonstrated its performance using a combination of pre-training on Imagenet and a custom gun dataset <ref type="bibr">[7]</ref>. Edge YOLO is a lightweight version of the YOLOv4 object detection algorithm that has been optimized for edge computing <ref type="bibr">[11]</ref>. The algorithm features a trimmed-down backbone using CSPNet, enhanced feature fusion, and a connection to a cloud-based GPU workstation for model training. Edge YOLO outperforms other popular algorithms, such as YOLOv3 and MobileNetv3 SSD, in terms of both speed and accuracy on the edge.</p><p>Although research has experimented to make realtime inferences with image and video frames with object detection and edge computing as a single domain, none of the approaches focuses on utilizing the real-time data to keep the model up-to-date. Real-time training of threat detection models on edge devices is cost-effective, power-efficient, and privacy-preserving. It allows for continuous adaptation to changing data, improving the model's performance. Online learning is well-suited for edge devices as it requires minimal storage, is maintainable, and fast <ref type="bibr">[3]</ref>. The proposed solution utilizes a low-power edge processor to process visual and thermal video frames in real-time. The cloud environment label annotates input data and updates the online feature store to update the weights of the local model copy (online model) in real-time. The edge model is updated by pulling weight parameters from the online model through set triggers, improving the physical security system's speed, accuracy, and reliability. This approach combines edge object detection and online learning methods to create a scalable, secure, and fast physical threat detection model that trains and predicts on real-time data. The features of the proposed approach are as follows:</p><p>&#8226; Proposed approach uses a lightweight version of the object detection algorithm on the edge to make a real-time inference contributing to faster inference time and offline availability of the service. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>REQUIREMENTS FOR PHYSICAL THREAT DETECTION MODEL</head><p>Selecting the proper dataset, model, and evaluation metrics are of key importance for any AI system to succeed. This section elaborates on these major components required to create the online physical threat detection system on the edge.</p><p>The Teledyne FLIR ADAS Dataset offers labeled thermal and visible images to train object detection systems using CNNs that recognize threats in both image types. The dataset includes images in various weather conditions to improve the detection system's adaptability. Common benchmark datasets such as ImageNet and COCO are often insufficient due to the limited availability of annotated data with diverse object labels. There are few labeled datasets that identify objects as threats, requiring custom images and videos with threat objects and proper label annotations to design the proposed online physical threat detection system. Pre-trained models on large datasets are frequently used and then retrained on custom datasets.</p><p>Although Faster R-CNN and RetinaNet are more accurate than YOLO, YOLO's real-time detection capability outshines other detection algorithms. In <ref type="bibr">[6]</ref>,</p><p>YOLOv3-SPP was identified as having high mAP and precision, but not ideal for edge deployment due to its low speed and high storage requirements. YOLOv5-s is recommended for edge deployment with a compact size of 14 MB, fast speed of 41 FPS, and high mAP and precision of 0.803 and 0.638 respectively, outperforming YOLOv3-SPP. GhostNet <ref type="bibr">[12]</ref> and coordinate attention (CA) <ref type="bibr">[13]</ref> methods can make YOLOv5-s even more lightweight but equally efficient. Using both techniques, <ref type="bibr">[14]</ref> generated reduced boundary box loss, classification loss, and object confidence loss in selfconstructed data. <ref type="bibr">[11]</ref> demonstrated better performance of the YOLOv4 version with a trimmed-down CSP-darknet53 and a modified backbone neck with Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (FPN) for cost-effective small object detection.</p><p>The modified GhostNet and CA backbone of YOLOv5-s can be improved with the same lightweight neck structure used in YOLOv4 to enhance our physical threat detection model performance. YOLOv5 uses batch normalization, leaky ReLU activation, Stochastic Gradient Descent optimization, binary cross-entropy for classification loss, and mean squared error for coordinate regression. Mean Average Precision (mAP), Intersection over Union (IoU), Precision, Recall, and Frames Per Second (FPS) are used to evaluate the proposed system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DESIGN OF THE PROPOSED FRAMEWORK</head><p>The proposed real-time physical threat detection framework with online learning is presented in Figure <ref type="figure">1</ref>. It is composed of two parts: learning and implementation phases, which are explained in detail in this section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Learning Phase</head><p>The learning phase consists of the process of preprocessing the dataset, using the data to train the YOLO model, and using the validation dataset to test the model.</p><p>To improve the algorithm's ability to predict images, data augmentation is necessary for both the training and validation datasets. This includes random crop, rotation, flips, saturation, and exposure shifts. Manual labeling is required for the augmented images and newly captured images containing threat labels. A minimum of 1500 images per class and 10000 labeled objects per class are recommended, and adding background images with no objects can help reduce false positives.</p><p>To make the object detection model operational, the proposed YOLO model is fed with annotated thermal and visual images. Pre-trained weights are suggested for small or medium datasets. The YOLOv5 model's default settings obtain good results on large and well-labeled datasets with 300 epochs recommended. Overfitting can occur if the model is too complex. Regularization techniques such as L1 (Lasso) and L2 (Ridge) regularization can be used in such cases. Cross-validation can also be used to avoid overfitting. The batch size should always be as large as the hardware allows as a lower batch size generates poor batch normalization statistics. YOLOv5 has approximately 30 hyperparameters for training. A Genetic Algorithm (GA) is provided to optimize these hyperparameters, producing an optimal value by repeatedly mutating parent hyperparameters. For best results, 300 generation cycles are recommended, and the hyperparameters from the best-performing cycle are chosen for the model.</p><p>The performance of the YOLO model on the test image dataset can be evaluated using the validation dataset. This evaluation involves estimating the model's localization and classification errors, using the evaluation metrics discussed previously. The resulting Average Precision (AP) of each class, along with the mAP 0.5:0.95 score, provides valuable insights into the model's performance, thereby helping to identify any overfitting or underperforming issues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Implementation Phase</head><p>The implementation phase comprises of the edge application and the cloud application. The layers are discussed in detail below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Edge Application Edge computing devices such as</head><p>System on Chip (SoC), Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuits (ASIC), Central Processing Units (CPU), and Graphic Processing Units (GPU) are available for deployment. SoC is particularly notable due to its energy efficiency, small size, and high throughput. It can also use migration tools to convert deep learning frameworks into TensorRT for accelerated model inference. Among the NVIDIA Jetson series, Jetson TX2 NX and Jetson Nano are the most cost-effective.</p><p>Given adequate CUDA cores and storage capacity, Kafka is an appropriate option for the edge detection model which can operate without connecting to the cloud. Kafka provides real-time edge processing, low latency, and cost-efficient data processing in a reliable, scalable, and fault-tolerant way. Kafka clusters can be configured as a single node or a cluster with multiple brokers depending on the need for availability and resource constraints of the edge hardware. Data retention in Kafka is configurable, and cluster linking enables connections between small Kafka clusters at the edge and bigger Kafka clusters in the cloud using the Kafka protocol.</p><p>OpenCV converts the video stream into frames and resizes if necessary. Producers populate frames into the Kafka topic, while consumers collect data to feed into the batch-trained and validated YOLO model initially deployed on the edge. The threat detection model makes real-time inferences, and if an object meets the confidence threshold for a threat, the GUI Interface is notified and the sound system triggers an alarm. The input image should match the trained dataset for optimal results. The local model updates itself by pulling new weights from the real-time model deployed in the cloud, resulting in accurate predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cloud Application</head><p>The cloud application utilizes a Kafka cluster, active learning, an online feature store, and a copy of the initial YOLO model deployed at the edge. AWS offers suitable infrastructure to host clusters, create data labeling pipelines, and store online models. The replicated data streams from the edge are held in larger Kafka clusters in the cloud. The data in the Kafka cluster is partitioned into train and test topics. The data stream from the train and test topic is sent to the data labeling pipeline. A streaming labeling job is created, where human workers label the data in real-time. Streaming jobs work in a sliding window manner, and any jobs that pile up are stored in a queuing service. The labeled jobs are output through a stream channel.</p><p>Labeled data is utilized to build a feature pipeline that captures real-time features and stores them in an online feature store. The pipeline ensures thorough processing, validation, and transformation of the data into a usable format for inference or training. The in-memory database is used for the feature store to attain high throughput and low latency. The features are updated continuously in a streaming fashion to keep up with real-time data and also provide context.</p><p>The feature store employs automated stateful training to continuously train the YOLO model with realtime data and updates the online model in the model store accordingly. At the edge, the local model pulls updated model parameters based on user-defined triggers. The feature store can track the model's lineage, but evaluation of the online model using test stream data is not yet possible. Incorporating explore-exploit strategies from bandit algorithms into the feature store can be a data-efficient solution, compared to traditional A/B testing.</p><p>Our proposed approach theoretically ensures fast and accurate real-time inference at the edge by leveraging privacy-aware stateful learning. The edge device can run a lightweight threat detection model independently, providing efficiency, accuracy, security, and reliability without needing to connect to the cloud.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DISCUSSION</head><p>This article proposes an online learning-enabled real-time edge threat detection model for fast, accurate, reliable, and privacy-aware detection. However, further experimentation and validation are needed to fully realize the potential of the proposed framework. In this section, we provide insights into expected outcomes.</p><p>The study in <ref type="bibr">[6]</ref> evaluates multiple versions of YOLO for multi-object detection and finds YOLOv5s to have the best overall performance in terms of mAP, precision, speed, and storage when tested on the FLIR ADAS thermal dataset, with values of 0.803, 0.638, 41 FPS, and 14MB respectively. <ref type="bibr">[14]</ref> supports the improved performance of the YOLOv5s Ghost CA model for facial expression detection, with a boost in mAP 0.5 from 98.4 to 98.8 and maintained mAP 0.5:0.95 . This results in reduced weights (15.4 MB to 8 MB), parameters (7.02 M to 3.70 M), and computation cost (15.8 to 8.1 GFLOPs), with improved inference time (115 FPS to 123 FPS). EdgeY-OLO <ref type="bibr">[11]</ref> improved FPS to 26.6 from 4.9 on YOLOv4, with COCO2017 dataset, powered on Jetson Xavier. With the KITTI dataset, EdgeYOLO improved FPS to 40.6 from 5.2 on YOLOv4, powered on Jetson Xavier. Jetson Nano, with its lower memory and computation power, showed lower FPS than Jetson Xavier, but still demonstrated improved speed between YOLOv4 and EdgeYOLO. Our framework utilizes backbone of the YOLOv5-s Ghost CA model <ref type="bibr">[14]</ref> with modified neck from EdgeYOLO <ref type="bibr">[11]</ref> combined with online learning. The proposed model combining lightweight YOLOv5s with real-time stateful learning, has the potential to optimize weights, parameters, and computation costs while maintaining a high mAP score. Moreover, it can improve latency and reduce energy consumption, based on the promising results of these approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CONCLUSION</head><p>This article proposes a real-time online physical threat detection model using edge AI computing and online learning. The experimental nature of real-time machine learning, manual data generation, and multiple parameters pose a challenge in achieving optimal results. Repeated experimentation using optimized models and advanced edge hardware could soon realize this vision, enhancing speed, accuracy, reliability, and privacy in computing technology.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>May/June 2022</p></note>
		</body>
		</text>
</TEI>
