<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Performance analysis of capsule networks for detecting GPS spoofing attacks on unmanned aerial vehicles</title></titleStmt>
			<publicationStmt>
				<publisher>Springer Nature</publisher>
				<date>02/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10566476</idno>
					<idno type="doi">10.1007/s10207-024-00978-x</idno>
					<title level='j'>International Journal of Information Security</title>
<idno>1615-5262</idno>
<biblScope unit="volume">24</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Tala TalaeiKhoei</author><author>Khair AlShamaileh</author><author>Vijaya Kumar Devabhaktuni</author><author>Naima Kaabouch</author><author>NA</author><author>NA</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Unmanned aerial vehicles (UAVs) are prone to several cyber-attacks, including global positioning system (GPS) spoofing. The use of machine learning and deep learning are becoming increasingly common for UAV GPS spoofing attack detection; however, these approaches have some limitations, such as a high rate of false alarm and misdetection. We propose using capsule networks to detect and classify UAV-focused GPS spoofing attacks. This paper compares simple capsule networks, efficient capsule networks, dual attention capsule networks, and convolutional neural network in terms of accuracy, probability of detection, probability of misdetection, probability of false alarm, prediction time, training time per sample, and memory size. The results indicate that the Efficient-capsule network outperforms the other models, as demonstrated by an accuracy of 99.1%, a probability of detection of 99.9%, a probability of misdetection of 0.1%, a probability of false alarm of 0.37%, a prediction time of 0.5 seconds, a training time per sample of 0.2 seconds, and a memory size of 123 mebibytes for binary classification.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Unmanned aerial vehicles (UAVs) are military and civilian aircraft designed to conduct many operations, including inspections, surveillance, reconnaissance, shooting, agricultural inspections, and rescue operations. UAVs use global positioning systems (GPS) for positioning and navigation. Civilian GPS signals are not encrypted, which creates several security risks. GPS systems are particularly prone to multiple cyber-attacks and threats, including spoofing, mimicking, and jamming. Attackers send fake GPS signals containing incorrect times, positions, and navigation information, result-</p><p>The third category utilizes machine learning and deep learning models to detect UAV-focused GPS spoofing. Existing studies used these models to detect GPS spoofing attacks; however, machine learning models suffer from several limitations, such as overfitting, high rates of false alarms, and misdetection <ref type="bibr">[8]</ref>. Some studies have used deep learning models, such as long short-term memory, residual neural networks, and artificial neural networks, as a result. Convolutional neural networks (CNNs) are commonly used deep learning models that have been investigated. CNN models can learn information through a supervised approach by using convolution operations, pooling layers, and Soft-Max functions. These models can automatically extract numerous invariant and discriminative features for one-and two-dimensional data. The key characteristic of these models is their ability to replicate the same knowledge at all points in an input dataset's spatial dimension; therefore, the features at one spatial location using replicas of feature detectors are available at other locations. The CNN models have local shared connectivity that is connected to layers of spatial reduction, such as max-pooling, and local translationinvariant features, which result in routing low-level features between layers using max-pooling.</p><p>CNN algorithms have several shortcomings. For example, these algorithms are significantly slower than other DL models due to the max-pooling layers' performance. These algorithms also have a longer training process and require large datasets for processing, training, testing, and validation. A CNN-based algorithm, capsule network (CapsNet), was recently proposed to address these issues. This model consists of a group of capsules with each neuron's output representing a different feature property. The capsules process the given features at their inputs and encapsulate the results into a vector of highly informative outputs. A capsule is a replacement for artificial neurons. One difference is that artificial neurons handle scalars while the capsules handle vectors. The length of an activity vector is defined as the probability of an entity, and its orientation is an instantiation parameter. Active capsules make predictions through transformation metrics for instantiation parameters of higher-level capsules. A higher-level capsule change to an active capsule once several predictions are completed <ref type="bibr">[9,</ref><ref type="bibr">10]</ref>.</p><p>We investigated the performance of three new CapsNet algorithms: conventional CapsNet, efficient-CapsNet, and dual attention-CapsNet (DA-CapsNet). We then compared these results with those of a conventional CNN models for detecting UAV-focused GPS spoofing attacks. The evaluation was performed in terms of accuracy, probability of detection, probability of misdetection, probability of false alarm, prediction time, training time per sample, and memory size. To summarize, the main contributions of this study are listed, as follows:</p><p>&#8226; Introducing three CNN-based algorithms, part of the Capsule Family, addressing the issues of CNN models, &#8226; Developing conventional CapsNet, efficient-CapsNet, and dual attention-CapsNet to detect and classify GPS spoofing attacks on UAVs, &#8226; Evaluating these models in terms of accuracy, probability of detection, probability of misdetection, probability of false alarm, prediction time, training time per sample, and memory size, &#8226; Providing a comprehensive comparison between these models and other proposed models in the literature with respect to the used data.</p><p>This paper is organized as follows: Section 2 presents the related work, whereas Sect. 3 discusses the corresponding dataset, data pre-processing techniques, training process, and evaluation metrics. The results of the study are presented and analyzed in Sect. 4. The conclusion is outlined in Sect. 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related work</head><p>Numerous techniques have been proposed to classify, detect, and mitigate GPS spoofing attacks. These techniques are divided into three categories: (1) hardware-based, (2) signal processing-based, and (3) machine learning (ML) and deep learning (DL)-based techniques <ref type="bibr">[4]</ref>. The first category includes techniques based on spatial and geometrical UAV characteristics, which impact onboard sensors such as compasses, barometers, and inertial measurement units. To be efficient, these techniques require additional hardware, such as high-quality compact sensors and continuous sensor calibration. These costly sensors are not acceptable solutions for GPS spoofing detection in small UAVs <ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref>. The second category uses vision-based methods requiring intensive signal processing, potentially impacting real-time system performance and additional communication overhead. These methods are not effective if an attacker introduces a time delay or drifts into the spoofing signals and if an attacker does not know the receiver's actual location.</p><p>The third category utilizes machine learning and deep learning models to detect UAV-focused GPS spoofing. Existing studies used these models to detect GPS spoofing attacks; however, machine learning models suffer from several limitations, such as overfitting, high rates of false alarms, and misdetection <ref type="bibr">[8]</ref>. Some studies have used deep learning models, such as long short-term memory, residual neural networks, and artificial neural networks, as a result. Convolutional neural networks (CNNs) have also been investigated. These models can learn information through a supervised approach by using convolution operations, pooling layers, and SoftMax functions. They can automatically extract numerous invariant and discriminative features for one-and two-dimensional data. The key characteristic of these models is their ability to replicate the same knowledge at all points in an input dataset's spatial dimension; therefore, the features at one spatial location using replicas of feature detectors are available at other locations. These models have local shared connectivity that is connected to layers of spatial reduction, such as max-pooling and local translation-invariant features, which result in routing low-level features between layers using max-pooling.</p><p>Table <ref type="table">1</ref> provides a short list of techniques that have been proposed to detect and classify UAV-focused GPS spoofing attacks. These studies are divided according to these categories. For instance, the authors of <ref type="bibr">[11]</ref> used a hardwarebased technique to detect GPS spoofing using an inertial measurement unit (IMU). This technique depends heavily on acceleration error, which is calculated by comparing the GPS data to the IMU data. The authors of <ref type="bibr">[12]</ref> introduced a visionbased approach for detecting GPS spoofing attacks using monocular and IMU sensors. The UAV's velocity was computed using the onboard sensors. This velocity was compared to the velocity calculated using the Lucas Kanade approach. The UAV was spoofed if the value of the root means square errors from these two approaches were different. Another vision-based approach was proposed in <ref type="bibr">[13]</ref>. A UAV trajectory was determined using visual odometry since fake GPS signals do not alter images. The GPS flight trajectory was compared to the UAV's trajectory to detect fake signals. This approach was effective for long-distance UAV flight scenarios when the direction was changed by more than 3 &#8226; .</p><p>Many studies have used ML and DL models to detect and classify UAV-focused GPS spoofing attacks. For instance, the authors of <ref type="bibr">[2]</ref> proposed using artificial neural networks. Authentic GPS signals and simulated attacks were collected to implement the training and testing dataset. Another study introduced a genetic algorithm with an extreme boosting model <ref type="bibr">[14]</ref>. The model was pre-trained offboard using the flight logs to decrease power consumption and hardware resources during long-term operations, then the genetic algorithm was used to optimize the developed model. The authors of <ref type="bibr">[15]</ref> compared the performance of several tree-based ML models for detecting UAV-focused GPS spoofing, including gradient boosting, light gradient boosting, and extreme gradient boosting. A dataset with 13 features that included spoofed and non-spoofed signals was used for training. The authors of <ref type="bibr">[16,</ref><ref type="bibr">17]</ref> developed some common machine learning models, such as support vector machine, to detect UAV-focused GPS spoofing attacks.</p><p>The authors of <ref type="bibr">[18]</ref> compared various conventional instance-based ML models to detect and classify UAVfocused GPS spoofing attacks, including linear support vector machine, numerical support vector machine, C-support vector machine, K nearest neighbor, and radius neighbor. The authors used a correlation-based technique, Spearman Correlation Coefficient, to reduce dataset dimensionality, training complexity, and time. The authors of <ref type="bibr">[19]</ref> evaluated three different ensemble models: stacking, bagging, and boosting. The results indicated that these models outperformed other traditional machine learning models, such as support vector machine, when detecting UAV-focused GPS spoofing attacks. Other papers have proposed using DL-based techniques to detect UAV-focused GPS spoofing attacks. For instance, the authors of <ref type="bibr">[20]</ref> proposed an approach, long short-term memory, that can effectively detect GPS spoofing attacks within five seconds. The authors of <ref type="bibr">[21]</ref> proposed a residual neural network to detect GPS spoofed signals using images. The historical satellite images were compared to real-time camera images, and the detection was performed using a threshold of the similarity between satellite imagery and aerial photography.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">GPS spoofing detection</head><p>The proposed detection architecture consists of several phases (Fig. <ref type="figure">1</ref>), including data acquisition, pre-processing, training, detection, and classification. Real-time experiments and simulations were performed during data acquisition to collect authentic signals and GPS spoofing attacks <ref type="bibr">[15]</ref>. The corresponding dataset was then pre-processed by applying several techniques, including data transformation, class balancing, and data encoding. The models were trained with the dataset, and their performance was validated with the testing dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dataset</head><p>The dataset used in this work was created during a previous work <ref type="bibr">[18]</ref>. This dataset consists of authentic and spoofed signals. The corresponding data consists of three types of GPS spoofing attacks: simplistic, intermediate, and sophisticated. Thirteen features were identified and extracted from the raw signals. In simplistic attacks, the spoofer generates fake GPS signals, unsynchronized with normal signals. In this attack, the attacker does not have any knowledge about the receiver's position, resulting in a high Doppler Shift. Therefore, a huge deviation can occur in the pseudo-range measurement. In addition, the attacker transmits the spoofed signals at highpower levels, leading to higher carrier to noise levels.</p><p>In intermediate spoofing, the attacker can control the UAV by precisely handling the GPS-generated signals. Here, the attacker knows the target position, resulting in code phase alignment between the real and spoofed transmissions. In contrast to simplistic spoofing, the Doppler shift and pseudo range resulting from the intermediate spoofing are usually within normal ranges. In addition, other features, such as time of the week, carrier phase shift, and correlator amplitude,  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">DL models</head><p>CNN models have become increasingly popular in many fields, including security. The basic CNN architecture is illustrated in Fig. <ref type="figure">2a</ref>. This architecture typically consists of several convolutional layers, pooling layers, and fully connected layers. A primary advantage of these networks is the low number of parameters compared to other types of neural networks, such as ANNs. Another important characteristic of CNNs is that they can extract abstract information when their input data grows into deeper layers. The CNN models provide several benefits; however, they have some significant limitations, such as slow performance and long training time <ref type="bibr">[21]</ref><ref type="bibr">[22]</ref><ref type="bibr">[23]</ref>.</p><p>A new CNN-based model, CapsNet, was proposed to solve these limitations. This architecture is a novel type of neural network that uses a vector-in and vector-out to transmit information. The typical information unit in a capsule network is a vectorized capsule that consists of multiple scalars, unlike traditional neural networks that are embedded with scalar neurons. Each capsule has different features with characteristics. The module length of a capsule also provides special meaning, such as probability of feature existence. CapsNet has three important layers: convolutional layer, primary capsule layer, and digit capsule layer (Fig. <ref type="figure">2b</ref>). This model also consists of a function, attention, primarily used to find the data with the most important and relevant information, helping the network focus on specific parts of the data rather than the entire dataset. The attention function can help distinguish the best features corresponding to the target variables. CapsNet can also transfer the features using vectorized methods. These methods may lead to computational overhead, allowing the CapsNet architecture to work effectively with sufficient features, which highly depends on the model's first few neural layers <ref type="bibr">[24,</ref><ref type="bibr">25]</ref>.</p><p>Little attention has been focused on CapsNet's efficiency and its ability to present knowledge transformations, despite the benefits of CapsNet models over CNN models. The existing solutions for classification problems using CapsNet consist of many parameters, which automatically hide the generalization ability of the capsules. Efficient-CapsNet has been proposed to reduce the number of CapsNet parameters and improve capsule efficiency. The Efficient CapsNet architecture is divided into three Sections. Sections 1 and 2 are the necessary parts of the capsule layers that interact with the input space (Fig. <ref type="figure">2c</ref>). The self-attention function permits the capsules to communicate with other capsules, known as self-capsules, and discover the capsules that need more attention. The output is the aggregation of these communication and attention scores. The capsule's final layer does not provide the probability of a particular class; however, it returns the probabilities extracted from its individual Sections <ref type="bibr">[26]</ref>. Another CapsNet model, DA-CapsNet (Fig. <ref type="figure">2d</ref>) was proposed to investigate the importance of the attention function in the performance of the CapsNet family. DA-CApsNet is significantly different than the other capsule families, such as CapsNet and Efficient-CapsNet. DA-CapsNet has two layers of attention mechanisms. These two layers are known as Convolutional Attention in ReLU Convolution to PrimaryCaps and Caps Attention in PrimaryCaps to Digit-Caps. The main reason for adding two layers of attention in DA-CapsNet is to improve the extraction of essential information in the capsules, decrease the transforming of the non-essential information, improve the contribution of the important information into capsules, and improve the hierarchy of the capsules <ref type="bibr">[27]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Evaluation metrics</head><p>To evaluate and compare the performance of the proposed model to existing models, the following metrics are used:</p><p>&#8226; Accuracy (ACC): represents the probability of correct predictions to all predictions, and is given by:</p><p>&#8226; Probability of Detection ( p d ): denotes the probability of correctly classified spoofed signals over the total number of spoofed signals, and is given by:</p><p>&#8226; Probability of Misdetection ( p md ): features the probability of the spoofed signals incorrectly classified as authentic signals over the total number of spoofed signals, and is given by:</p><p>&#8226; Probability of False Alarm ( p f a ): denotes the probability of the authentic signals classified incorrectly as spoofed signals over the total number of authentic signals, and is given by:</p><p>where T P stands for the true positive, T N defines as the true positive, F P represents the false positive, and F N denotes the false negative. In addition, the efficiency of the models is evaluated in terms of prediction time (PT), training per sample (TPS), and memory size (M). These metrics are defined as follows:</p><p>&#8226; PT: time that a model uses to predict malicious signals.</p><p>&#8226; TPS: time required to process a sample during model training. &#8226; M: size of the memory that a model uses during the whole deep learning process.</p><p>In this study, the confusion matrix is used to visualize the results of binary and multi-classification scenarios. Figure <ref type="figure">3</ref> shows the structures of such matrices for these scenarios. The values in these matrices are used to calculate the ACC, p d , p md , p f a , based on (1)-( <ref type="formula">4</ref>). As one can observe, Fig. <ref type="figure">3a</ref> has labels of positive and negative, and the result of predicted class is indicated as T P , F P , T N , and F N . However, in Fig. <ref type="figure">3b</ref>, confusion matrix in multiclassification scenario consists of N different classes. In this matrix, the characterization of T P , F P , T N , and F N samples is not applicable. In this case, it is pracatical to perform an analysis, focusing on particular class based on their provided labels <ref type="bibr">[28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>Four neural network models were implemented: CapsNet, Efficient-CapsNet, DA-CapsNet, and CNN. These models were trained using 10-fold cross validation for 50 epochs per fold with a batch size of 50 and tested using the adaptive moment estimation (ADAM) optimizer, with a learning rate of 0.01. The results were obtained using a &#946; 1,2 = 0.96. Here, &#946; 1 is the decay rate of the first moment and sum of the gradient, while &#946; 2 is the decay rate for the second moment and sum of the gradient squared. It is noteworthy to point out that a 1 &#215; 10 -5 decay is used during training and testing. The dataset is shuffled and split into 60% and 20% in training and validation, respectively. The remaining 20% is used for testing. Training is performed with an Intel Xeon CPU E5-1620 v4@3.50 GHz CPU with 16 GB of memory, TensorFlow 2.0, and Python 3.8. Fig. <ref type="figure">4</ref> presents the confusion matrices of the selected models for binary classification (Fig. <ref type="figure">4A</ref>) and multiclassification (Fig. <ref type="figure">4B</ref>). For example, the CapsNet confusion matrix in binary classification, as shown in Fig. <ref type="figure">4aA</ref>, indicates that 359 out of 1600 spoofed samples are correctly classified, while only 18 out of 1601 samples are misclassified as authentic signals. The Efficient-CapsNet confusion matrix, as illustrated in Fig. <ref type="figure">4aB</ref>, shows that 713 out of 1600 spoofed Fig. <ref type="figure">3</ref> Structure of confusion matrix for the binary and multi-classifications <ref type="bibr">[28]</ref> samples are classified correctly and only 1 sample out of 1601 is misclassified as authentic signal for binary classification. Moreover, the Efficient-CapsNet confusion matrix in multiclassification, as presented in Fig. <ref type="figure">4bB</ref>, depicts 669 simplistic samples out of 1600, 268 intermediate samples out of 1600, 690 sophisticated samples out of 1600 samples, and 262 authentic out of 1601 samples, which are correctly classified. Also, only 1 intermediate sample out of 1600 samples is misclassified as sophisticated sample. The contents of these matrices were used to calculated the previously mentioned evaluation metrics, and the results are provided in Figs. <ref type="figure">5</ref>, <ref type="figure">6</ref>, 7, 8 and 9 and Tables <ref type="table">4</ref>, <ref type="table">5</ref>, and 6.</p><p>Figure <ref type="figure">5</ref> illustrates the accuracy of the proposed models. The accuracy of the Efficient-CapsNet model was higher compared to the other models. This model outperformed the other models with an accuracy of 98.29% for binary classification. The CNN model had the worst accuracy among all models, with an accuracy of 87.8% for the binary classes. The other models, DA-CapsNet and CapsNet, yielded acceptable results in terms of accuracy. Sophisticated attacks were detected in multiclassification, with the highest accuracy of 99.57% among all attacks using this Efficient-capsNet model; however, the simplistic and intermediate attacks were detected with lower accuracy than the sophisticated attacks using Efficient-CapsNet. These attacks could be detected and classified using CNN with the worst accuracy.</p><p>Figure <ref type="figure">6</ref> represents the probability of detection for the four models. The probability of detection of Efficient-CapsNet was higher than the other models for binary classification. This model had a probability of detection of 99.9% for binary classification; however, the CNN model had the lowest of detection among all models, with a probability of detection of 85.58% for binary classification. The sophisticated attacks were detected during multiclassification with a probability of detection of 99.85% using the Efficient-CapsNet model. Simplistic and intermediate attacks were detected with a lower probability of detection than that of the sophisticated attacks using Efficient-CapsNet. The other DL models, DA-CapsNet and CapsNet, yielded satisfactory results in terms of prob-ability of detection. The attacks could be detected with the lowest probability of detection using the CNN model.</p><p>Figure <ref type="figure">7</ref> presents the results of the highlighted models in terms of probability of misdetection. The Efficient-CapsNet model yielded the lowest probability of misdetection of 0.1% among other models for binary classification, followed by DA-CapsNet, CapsNet, and CNN. All GPS spoofing attacks were detected during multiclassification with a low probability of misdetection using the Efficient-CapsNet model, while the same attacks were detected with the worst probability of misdetection using CNN. This figure also indicates that Efficient-Capset detects sophisticated attacks with a lower probability of misdetection than the other attacks. The intermediate attacks could be detected and classified with a slightly higher probability of misdetection using Efficient-CapsNet compared to the same attacks using DA-CapsNet and CapsNet. DA-CapsNet and CapsNet yielded a satisfactory probability of misdetection. The CNN model yielded the highest probability of misdetection for detecting and classifying GPS spoofing attacks.</p><p>Figure <ref type="figure">8</ref> presents the results of the selected models in terms of probability of false alarm. The Efficient-CapsNet model had the lowest result and the best probability of false alarm among all models for binary classification. The CNN model yielded the highest result and the worst probability of false alarm for binary classification. Efficient-CapsNet could detect any types of GPS spoofing for multiclassification, with a low probability of false alarm, ranging between 0.37% to 4.23%, while the other models had a considerably higher probability of false alarm. The intermediate attacks could be detected and classified using Efficient-CapsNet with a probability of false alarm of 0.05%, the lowest probability of false alarm compared to the other GPS spoofing attacks.</p><p>Table <ref type="table">4</ref> lists the results of the other metrics, prediction time, training time per sample, and memory size. The Efficient-CapsNet model yielded the lowest prediction time, training time per sample, and memory size, while CNN performed the worst. The GPS spoofing attacks could be detected using Efficient-CapsNet, yielding the best results compared to the other models. The Efficient-CapsNet mod-Fig. <ref type="figure">4</ref> Confusion matrices of the binary and four-class classification of the selected models els outperformed the other models in terms of highlighted metrics due to its several benefits, such as a limited number of parameters, self-attention routing algorithm, and highly efficient pooling layers, resulting in better detection techniques and UAV-focused GPS spoofing attack classification.</p><p>It is critical to select efficient models for detecting and classifying UAV-focused GPS spoofing attacks due to the size, weight, and power constraints; therefore, we selected time metrics, prediction time and training time per sample, as functions of Epoch. These metrics can present model effectiveness with respect to the amount of time a learning algorithm requires to complete the entire training benchmark. Understanding the training time per sample and prediction time in any deep learning model is pivotal for optimizing model performance and resource utilization. Knowing the time it takes to train the model on each sample enables the identification and rectification of bottlenecks in the training process, resulting in a faster convergence. This information aids in proper resource allocation, ensuring that computational power is utilized efficiently, particularly in large-scale or distributed training scenarios, such as UAV-based networks. Prediction time is equally crucial, especially in security applications. Faster prediction times lead to more responsive applications, contributing to a user-friendly interface. Additionally, awareness of time requirements helps in selecting appropriate hardware and managing associated costs. It also plays a role in model selection, allowing practitioners to balance model accuracy with computational efficiency. In essence, these time metrics serve as essential parameters for making informed decisions and enhancing the overall efficiency of DL-based workflows.</p><p>Tables <ref type="table">5</ref> and <ref type="table">6</ref> present the results of the existing studies for detecting and classifying UAV-focused GPS spoofing attacks and compare the results to our proposed models. Table <ref type="table">4</ref> lists the results of the existing studies with respect to different datasets. The proposed models in this study were evaluated based on several evaluation metrics for binary and multiclassifications, although existing techniques were evaluated based on a limited number of metrics and binary classifications. For example, the authors <ref type="bibr">[2]</ref> used artificial neural networks to detect and classify GPS spoofing attacks, using a dataset with 5 features and limited numbers of samples. The proposed model provides good performance with an accuracy of 98.3%, a probability of detection of 99.2%, a probability of misdetection of 0.8%, and false alarm of 2.6%. The authors <ref type="bibr">[16]</ref> used an SVM-based technique to detect GPS spoofing attacks. The proposed approach provided a high accuracy of 98.77% for binary classification, using a dataset with 11 features. Although the results of these existing studies are satisfactory, the proposed models in this study provided higher performance, particularly the Efficient-CapsNet model which outperformed the other techniques from the literature.</p><p>Table <ref type="table">6</ref> illustrates the results of the existing studies using a similar dataset and compares the results of this study to studies from the literature. A limited number of studies have detected and classified UAV-focused GPS spoofing attacks using the same dataset we used in this study. No existing works provide results for detecting and classifying different types of UAV-focused GPS spoofing attacks to the best Fig. 6 Test probability of detection for the proposed models Fig. 7 Test Probability of misdetection for the proposed models Table 4 Test results of the proposed models in terms of prediction time, training time per sample, and memory size Model Attack category Prediction time (Seconds) Training time per sample (Seconds) Memory size (Mebibytes) CapsNet Simplistic 0.88 0.49 187 Intermediate 0.82 0.34 172 Sophisticated 0.79 0.3 166 Binary 0.76 0.4 170 Efficient-CapsNet Simplistic 0.54 0.32 127 Intermediate 0.53 0.31 133 Sophisticated 0.47 0.19 121 Binary 0.5 0.2 123 DA-CapsNet Simplistic 0.77 0.45 187 Intermediate 0.67 0.33 149 Sophisticated 0.69 0.21 129 Binary 0.54 0.28 150 CNN Simplistic 1.78 1.23 445 Intermediate 1.76 0.99 490 Sophisticated 1.12 0.78 421 Binary 1.85 0.86 273.9 Bold indicates the best performance results 123  of our knowledge. Existing studies in the literature used a limited number of evaluation metrics, resulting in inaccurate evaluations; therefore, the DL models proposed in this study exhibited higher performance when detecting and classifying attacks on UAVs compared to other studies in the literature. Efficient-CapsNet yielded the best results for detecting and classifying binary and multiclasses. The CNN model had the lowest performance for detecting UAV-focused GPS spoofing attacks. The other models, CapsNet and DA-CapsNet, had moderately lower accuracy, probability of detection, higher probability of misdetection, probability of false alarm, prediction time, training time per sample, and memory size compared to Efficient-CapsNet. Sophisticated attacks could be classified and detected with higher-performance results than other GPS spoofing attacks. Efficient-CapsNet could detect these attacks with high accuracy, probability of detection, low probability of misdetection, probability of false alarm, prediction time, training time per sample, and memory size. Efficient-CapsNet could also yield lower prediction times and training time per sample than other proposed DL models with respect to Epoch. The results can be summarized as follows: &#8226; The capsule family outperforms other existing techniques for binary and multi-class problems, &#8226; Efficient-CapsNet yielded the best results for binary and multi-class problems in terms of selected metrics, &#8226; CNN had the worst results among Efficient-CapsNet, CapsNet, and Da-CapsNet for binary and multi-class problems. 5 Conclusion Many approaches have been proposed in the literature to detect and classify GPS spoofing attacks; however, these techniques have several limitations, including high false alarm and misdetection rates. We evaluated and compared the performance of new deep learning model types, Capsule Networks (Simple Capsule Network, Efficient Capsule Network, and Dual Attention Capsule Network), to Convolutional Neural Networks. The evaluation was performed in terms of accuracy, probability of detection, probability of misdetection, probability of false alarm, prediction time, training time per sample, memory size, and learning curves. The results indicate that the Efficient-Capsule Network yielded better results than the Capsule Network, Dual Attention Capsule Network, and Convolutional Neural Network with an accuracy of 99.1%, a probability of detection of 99.9%, a probability of misdetection of 0.1%, a probability of false alarm of 0.37%, a prediction time of 0.5 seconds, and a training time per sample of 0.2 seconds. The Convolutional Neural Network model yielded the worst results with an accuracy of 87.84%, a probability of detection of 85.57%, a probability of misdetection of 14.42%, a probability of false alarm of 7.28%, a prediction time of 1.85 seconds, a training time per sample of 0.86 seconds, and a memory size of 573.9 mebibytes.</p></div></body>
		</text>
</TEI>
