<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Intrusion Response System for In-Vehicle Networks: Uncertainty-Aware Deep Reinforcement Learning-based Approach</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>10/28/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10640651</idno>
					<idno type="doi">10.1109/MILCOM61039.2024.10773966</idno>
					
					<author>Han Jun Yoon</author><author>David Soon</author><author>Terrence J Moore</author><author>Seunghyun Yoon</author><author>Hyuk Lim</author><author>Dongseong Kim</author><author>Frederica F Nelson</author><author>Jin-Hee Cho</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Not Available]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>The increasing integration of electronic control units (ECUs) in modern vehicles, managed through Controller Area Network (CAN) bus systems, has significantly enhanced vehicle functionality but has also introduced critical cybersecurity vulnerabilities <ref type="bibr">[1]</ref>. Despite advancements in intrusion detection systems (IDSs) for in-vehicle networks, there remains a notable gap in the research and development of effective intrusion response systems (IRSs). This gap leaves vehicles vulnerable to sophisticated cyber threats, compromising safety and reliability. Military vehicles and unmanned systems, like their civilian counterparts, utilize CAN buses, which are susceptible to cyber-attacks such as Denial of Service (DoS), spoofing, and fuzzing <ref type="bibr">[2]</ref>. A deep reinforcement learning (DRL)-based IRS offers an effective method for protecting these networks, playing a crucial role in ensuring the success and security of military missions in the face of cyber threats. The aim of this work is to address this critical need by developing an IRS that leverages deep reinforcement learning (DRL) to select the optimal defense against detected attacks autonomously. By integrating DRL, our system will dynamically and efficiently respond to cyber threats, ensuring robust protection and maintaining the integrity of vehicle operations. This innovative approach not only fills the existing void in in-vehicle security research but also sets a new standard for resilience against automotive cyber threats.</p><p>This work makes the following key contributions: First, our work pioneers a DRL-based Intrusion Response System (IRS) that effectively responds to multiple attack types using uncertainty-aware DRL with entropy regularization <ref type="bibr">[3]</ref> to enhance vehicle security and mitigate cyber threats. Second, we introduce a sub-action space for the IRS, featuring discrete defensive actions tailored to each detected intrusion type, improving the efficiency of defense strategy selection. Finally, through extensive experiments, we demonstrate the superior efficacy of our DRL-based IRS over baseline defenses (random or none), reducing the attack success ratio (ASR) by up to 60% and improving the mission success ratio (MSR) by up to 70%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Cybersecurity in In-Vehicle Networks</head><p>Han et al. <ref type="bibr">[4]</ref> introduced ID-Anonymization for CAN (IA-CAN), a novel protocol mitigating Denial-of-Service (DoS) attacks and securing in-vehicle and external communication. Wu et al. <ref type="bibr">[5]</ref> found that the CANoe framework and Genuino UNO boards, combined with machine learning (ML) algorithms, can accurately detect irregular activities within in-vehicle network systems. Kim and Shrestha <ref type="bibr">[6]</ref> proposed cybersecurity layers including network access control, real-time anomaly detection, and encryption protocols. El-Rewini et al. <ref type="bibr">[7]</ref> suggested System of Systems (SoS) strategies using cryptographic methods and authentication protocols to safeguard essential components.</p><p>One of the key tools for in-vehicle network cybersecurity is an IDS. Song et al. <ref type="bibr">[8]</ref> developed a lightweight IDS that improved significantly over previous rate-based methods. Seo et al. <ref type="bibr">[9]</ref> created GIDS (GAN-based IDS) using Generative Ad-versarial Nets to detect both known and unknown attacks. Kang and Kang <ref type="bibr">[10]</ref> developed a deep neural network (DNN)-based IDS enhancing detection capabilities while maintaining efficient real-time response. Despite various cybersecurity measures for in-vehicle networks, there is a notable lack of research on IRS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Intrusion Response Systems (IRSs)</head><p>Cheng et al. <ref type="bibr">[11]</ref> used a zero-sum stochastic game and a Bayesian attack graph to simulate network intrusions, incorporating various levels of Theory of Mind in attacker and defender strategies. Ullah et al. <ref type="bibr">[12]</ref> designed an IRS for targeted attacks that balance security and operational efficiency by considering attack likelihood and functional dependencies, thereby extending attack durations to deter threats. Nespoli et al. <ref type="bibr">[13]</ref> introduced an immune-inspired IRS employing a Genetic Algorithm to optimize countermeasures, similar to antibodies, to efficiently reduce security risks. DRL has been crucial for developing model-free IRSs. Hughes et al. <ref type="bibr">[14]</ref> applied deep Q-network learning to create an automated IRS with 21 actions, achieving impressive results with optimized hyperparameters. Iannucci et al. <ref type="bibr">[15]</ref> developed a model-free IRS using Q-Learning and DQN, with system designs based on node characteristics and a reward function to minimize costs and response times. However, these IRS technologies <ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">[13]</ref><ref type="bibr">[14]</ref><ref type="bibr">[15]</ref> have not been tailored for the cybersecurity of autonomous vehicles.</p><p>For in-vehicle network IRS, Hamad et al. <ref type="bibr">[16]</ref> explored the overall structure of IRS in in-vehicle networks, noting that their approach did not directly activate response systems following IDS alerts. Kwon et al. <ref type="bibr">[17]</ref> designed a solution to mitigate network intrusions in vehicles by reconfiguring the ECU and neutralizing malicious packets. However, IRS research for invehicle security is extremely rare. We address this gap by proposing a DRL-based IRS for in-vehicle cybersecurity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PROBLEM STATEMENT</head><p>We consider an in-vehicle network consisting of a CAN bus designed to assist the ECU in communicating with the outside world. Given a detected intrusion (i.e., a specific attack vector), the system must respond properly to counteract the intrusion. The attacker's aim is to inject random or malicious messages to disrupt the system's mission execution, where the mission is to reach a particular destination within a certain time constraint. Fig. <ref type="figure">1</ref> describes the network model considered in this work.</p><p>The given in-vehicle network aims to maximize the effectiveness of the IRS in terms of minimizing an attack success ratio (ASR) while completing the trip within the given deadline under adversarial attacks considered in this work (see our Attack Model in Section IV-C). To formally put it, we aim to:</p><p>where AS &#119905; (&#119889; &#119905; ) refers to attack success (or failure), returning 1 or 0, respectively, when &#119889; &#119905; is a chosen defense response at round &#119905;, &#119879; &#119888; is the time taken to complete the mission, &#119879; is a mission deadline, &#119889; &#119888; is the defense cost, and &#119873; &#119860; is the total number of attacks performed during the period of &#119879; &#119888; . Section IV-D describes the set of defense responses. The AS &#119905; (&#119889; &#119905; ) returns 1 if an attack is successful; 0 otherwise. If &#119879; &#119888; &gt; &#119879;, the mission is considered failed, and we set the total mission time, &#119879; &#119888; = &#119879;).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. SYSTEM MODEL</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Network Model</head><p>We consider an in-vehicle network where the CAN protocol facilitates communication between ECUs and the external environment. Each ECU manages specific car functions, such as drive gear and Revolutions-Per-Minute (RPM) gauge. Invehicle networks lack robust authentication and authorization due to their original design focus on performance, cost, and realtime communication over security, making them vulnerable to cyber threats like message spoofing and denial of service <ref type="bibr">[18]</ref>. Resource constraints, such as limited computational power and memory, further complicate implementing cryptographic solutions <ref type="bibr">[19]</ref>. Therefore, protocols like CAN have vulnerabilities that attackers can exploit. When the IDS detects an intrusion, the system aims to implement the most appropriate response to counter the attacker and protect the in-vehicle network, ensuring the vehicle can reach its destination safely and timely.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Node Model</head><p>This work considers the following types of nodes:</p><p>&#8226; The CAN bus network is a message-based protocol enabling reliable, priority-driven communication among vehicle ECUs. &#8226; ECUs are compact devices managing specific vehicle functions. Our study focuses on the engine control, transmission control, and ABS modules. &#8226; The IDS oversees CAN messages and identifies attacks. We utilize a machine learning-based IDS to notify the system of the specific type of attack. &#8226; The Telematic Unit (TU) is a communication device for twoway data exchange between a vehicle and the external environment via wireless modules. It enhances vehicle functions and comfort, such as navigation, and supports safe driving. &#8226; The Head Unit (HU) provides a unified hardware interface for the system, including screens, buttons, and system controls for various integrated information and entertainment functions.</p><p>TABLE I ATTACKER STRATEGIES, AND ATTACK IMPACT AS Attack strategies Attack impact &#119860;&#119878; 1 Denial of Service The ABS Control ECU will be flooded with a large number of messages, leading to sudden brake or brake malfunction. &#119860;&#119878; 2 Spoofing Change RPM randomly or switch drive gear for a given time-step depending on which spoof message was injected (i.e., spoofing messages are either related to RPM or drive gear) &#119860;&#119878; 3 Fuzzing Successful injection of fuzzing attack will result in the stop of the given vehicle.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Attack Model</head><p>The attackers inject attack messages through external connections such as the telematic unit, head unit, and OBD-II port. We summarize attacker strategies and their impact in Table <ref type="table">I</ref>. This work considers the following attack behaviors:</p><p>&#8226; Denial of Service (DoS) (&#119860;&#119878; 1 ): This attack aims to consume CAN bus bandwidth by sending a massive amount of messages, allowing an ECU node to dominate the CAN bus resources. We model the DoS attack by injecting large messages with the CAN ID set to 0&#215;000 into the vehicle networks. CAN-ID 0x000 has the highest priority on the CAN bus, allowing it to dominate communication and block lowerpriority messages. This overloads the network, preventing the ABS ECU from receiving timely messages and potentially causing braking control failure. &#8226; Spoofing (&#119860;&#119878; 2 ): CAN messages are injected to control specific functions. Spoofing messages target the engine control ECU with RPM-related CAN IDs and the transmission control ECU with gear-related CAN IDs. Successful spoofing can increase or lower RPM and switch the drive gear to neutral or park. &#8226; Fuzzing (&#119860;&#119878; 3 ): This attack sends random CAN IDs and data,</p><p>which can lead to a sudden vehicle stop if critical ECUs such as the engine, brakes, or transmission are targeted. These attacks can disrupt the vehicle's mission of reaching its destination within a limited timeline by causing malfunctions related to RPM, drive gear, and the braking system. To assess the impact of attack severity, we use the probability &#119875; &#119860; , which models the frequency of attacks launched by an attacker.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Defense Model</head><p>We consider the following defense strategies to counteract the attacks in the Attack Model (Section IV-C):</p><p>&#8226; Rate Limiting (&#119863;&#119878; 1 ) limits the rate of messages transmitted or received by ECUs to prevent damage from flooding attacks.</p><p>&#8226; Software Update (&#119863;&#119878; 2 ) releases new software versions for ECUs to patch known vulnerabilities or update software. The TU manages external communication for updates, while the CAN bus distributes them internally to the ECUs. &#8226; Access Control List (ACL) (&#119863;&#119878; 3 ) specifies which entities (e.g., ECUs, sensors, devices) are granted or denied access to network resources based on identity, role, or other attributes. &#8226; Network Filtering (&#119863;&#119878; 4 ) limits traffic from suspicious sources by controlling access to network resources. &#8226; Input Validation (&#119863;&#119878; 5 ) implements robust input validation and error-handling mechanisms to manage malformed or unexpected inputs, reducing the impact of fuzzing attacks. Defense cost is assigned as low, medium, or high based on the presumed complexity and resource demands of each strategy. Table II summarizes defender strategies, success conditions, and associated implementation costs.</p><p>V. UNCERTAINTY-AWARE DRL-BASED IRS We consider the defense strategies listed in Table <ref type="table">III</ref>. Upon detecting an attack, we identify a subset of effective defenses. For efficiency, we introduce the concept of a subgame in Game Theory <ref type="bibr">[20]</ref> to narrow down the defense strategies for each detected attack. For example, if &#119860;&#119878; 1 is detected, the defender will consider &#119863;&#119878; 1 , &#119863;&#119878; 3 , &#119863;&#119878; 4 , and &#119863;&#119878; 6 rather than all six strategies. This approach reduces the cost of calibrating the probability distribution for the defense strategies. &#119860;&#119878; 4 (no attack) represents the full action space, enabling the system to consider all defense options without the limitations imposed by a detected threat. This unrestricted approach allows for comprehensive evaluation and deployment of defense strategies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Entropy Regularization</head><p>To incorporate uncertainty into DRL, we use entropy regularization <ref type="bibr">[3]</ref>. This technique encourages more exploratory policies by adding a penalty based on the entropy of the policy distribution. Entropy measures the unpredictability of the agent's actions in a given state. The entropy of the action probability distribution is calculated as:</p><p>where &#119867; (&#120587; &#120579; (&#119886; 1 , &#119886; 2 , . . . , &#119886; &#119899; |&#119904; &#119905; )) is the entropy at state &#119904; &#119905; , &#119886; &#119894; represents an action, and &#120587; &#120579; (&#119886; &#119894; |&#119904; &#119905; ) is the probability of taking action &#119886; &#119894; given state &#119904; &#119905; under the policy parameterized by &#120579;. This entropy term is added to the objective function of a policybased DRL algorithm (e.g., PPO) with a coefficient &#120573; to control the regularization strength.</p><p>Without entropy regularization, a DRL agent may quickly converge to a deterministic policy, limiting its exploration. By encouraging a more exploratory and stochastic policy, entropy regularization introduces uncertainty into the learning process. This penalizes overly deterministic policies and promotes thorough exploration of the state-action space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. DRL-based Response Selection</head><p>We formulate the proposed optimization problem for the defender to select the best defense response, d &#119905; , maximizing its net effectiveness using DRL. The problem is based on a Markov Decision Process (MDP) with the following components:</p><p>&#8226; States: The set of states is defined as S = {s 1 , s 2 , . . . , s &#119905; , . . . , s &#119879; }, where s &#119905; represents the state at time &#119905;. At time &#119905;, s &#119905; is given by:</p><p>TABLE II DEFENDER STRATEGIES, CONDITIONS FOR SUCCESSFUL DEFENSE, AND DEFENSE COST (DEFENSE COST: LOW -1, MEDIUM -2, HIGH -3) DS Defense strategies Successful defense condition Defense cost &#119863;&#119878; 1 Rate Limiting Applied on a targeted ECU node with a success probability of &#119875; &#119903;&#119897; Low &#119863;&#119878; 2 Software Update Applied on a targeted ECU node with success probability of &#119875; &#119904;&#119906; Low &#119863;&#119878; 3 Access Control List &#119875; &#119886;&#119888;&#119897; % success probability of blacklisting the attacker node Medium &#119863;&#119878; 4 Network Filtering &#119875; &#119899; &#119891; % success probability of blacklisting the attacker node High &#119863;&#119878; 5 Input Validation Applied on a targeted ECU node with success probability of &#119875; &#119894;&#119907; High &#119863;&#119878; 6 No Defense N/A N/A TABLE III CONSIDERED DEFENSE ACTION SPACE UNDER EACH INTRUSION AS Action Space &#119860;&#119878; 1 &#119863;&#119878; 1 , &#119863;&#119878; 3 &#119863;&#119878; 4 , &#119863;&#119878; 6 &#119860;&#119878; 2 &#119863;&#119878; 3 , &#119863;&#119878; 4 , &#119863;&#119878; 6 &#119860;&#119878; 3 &#119863;&#119878; 2 , &#119863;&#119878; 5 , &#119863;&#119878; 6 &#119860;&#119878; 4 &#119863;&#119878; 1 , &#119863;&#119878; 2 , &#119863;&#119878; 3 , &#119863;&#119878; 4 , &#119863;&#119878; 5 , &#119863;&#119878; 6 where &#119860;&#119879; &#119905; is the detected intrusion type at time &#119905;. &#8226; Actions: The set of actions is defined as A = {a 1 , . . . , a &#119905; , . . . , a &#119879; }, where a &#119905; represents the actions available at time &#119905;. At time &#119905;, a &#119905; is:</p><p>where d * &#119905; is the best defense response to counteract the attack and minimize ASR.</p><p>&#8226; Rewards: The reward function at time step &#119905;, denoted as R(s &#119905; , a &#119905; , s &#119905;+1 ), for an action taken is calculated with two key objectives in mind: (1) to maximize the reduction in the ASR between two consecutive time points, &#119905; and &#119905; + 1, which assesses the effectiveness of the selected defense action against an attack; and (2) to maximize defense efficiency by minimizing the cost associated with the defense action. The formulation of the reward at time &#119905; is:</p><p>difference in the number of successful attacks between two consecutive states</p><p>The first term related to attack success (AS) is negative when the number of successful attacks increases from &#119905; to &#119905; + 1, and zero otherwise. The second term, associated with defense costs, decreases as defense costs rise and increases when they fall. We employ weights, &#120572; and &#120573;, to balance these objectives and prevent one from overshadowing the other with &#120572; + &#120573; = 1.</p><p>Here AS &#119894; (&#119889; &#119894; ) returns 1 when a launched attack is successful at time &#119894; when a defense action &#119889; &#119894; is taken at time &#119894;. &#8226; Transition Probabilities: The transition probability &#119879; (s &#119905; , a &#119905; , s &#119905;+1 ) represents the likelihood of moving from state s &#119905; to state s &#119905;+1 via action a &#119905; . &#8226; Reward Accumulation: The accumulated reward is given by:</p><p>where &#120574; is the discount factor in the range [0, 1], with lower values favoring immediate rewards.</p><p>The policy function, &#120587; : &#119904; &#8594; &#119886;, maps states to a probability distribution of actions. Given an MDP episode of length &#119879; &#119888; , the sequence of states, actions, and rewards forms the policy's trajectory. The goal of RL is to identify the optimal policy that maximizes the expected reward.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. EXPERIMENTAL SETUP</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Datasets.</head><p>We use open-source car-hacking datasets <ref type="bibr">[21]</ref>, which include DoS attacks, fuzzy attacks, drive gear spoofing, and RPM gauge spoofing. These datasets were created by logging CAN traffic from a real vehicle via the OBD-II port during message injection attacks. We use these datasets to train our IDS to detect intrusion types.</p><p>ML-based IDS Setup. We use an existing ML classifier to build predictive models for IDS implementation. Using opensource car-hacking datasets <ref type="bibr">[21]</ref>, we develop a Random Forestbased IDS with nearly 97% accuracy.</p><p>Metrics. Our experiments use the following metrics:</p><p>&#8226; Attack Success Ratio (ASR) measures the ratio of successful attacks to the total number of attacks launched. &#8226; Attack Success Impact (ASI) measures the change in throttle, brake, and gear value due to successful attacks. &#8226; Mission Success Ratio (MSR) refers to the ratio of successful missions to the total number of missions attempted. &#8226; Defense Cost (DC) indicates the total defense cost incurred during mission execution. &#8226; Route Completion (RC) refers to the percentage of the route distance completed by the vehicle. &#8226; Infraction Score (IS) sums all infractions as a geometric series, with each rule violation or unsafe behavior contributing less to the total score. &#8226; Driving Score (DS) is the product of route completion and the infraction penalty.</p><p>Comparing schemes. To select a defense strategy, we use the following algorithms for extensive experimental validation:</p><p>&#8226; Proximal Policy Optimization (PPO) <ref type="bibr">[22]</ref> is an RL algorithm that enhances training stability and reliability by using a clipped objective function to prevent large policy updates, ensuring controlled and effective learning. &#8226; Deep Q Learning (DQN) <ref type="bibr">[23]</ref> utilizes neural networks parameterized by &#120579; to represent the action-value function, assuming the agent observing the environment fully.</p><p>&#8226; Sub-action-based PPO (S-PPO) uses PPO with a sub-action space, where a subset of the full action space is employed based on the detected attack type. &#8226; Sub-action-based DQN (S-DQN) applies DQN with a subaction space upon detecting an attack. &#8226; Random is a method that selects a defense strategy at random from all available strategies. &#8226; No Defense is a baseline approach with no defense strategy to counter the detected attacks. Table <ref type="table">V</ref> summarizes the design parameters and their meanings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. NUMERICAL RESULTS &amp; ANALYSES</head><p>Analysis of Security and Mission Performance: Fig. <ref type="figure">2</ref> shows experimental results on how attack severity (&#119875; &#119860; ) affects key metrics: attack success ratio (ASR), mission success ratio (MSR), defense cost (DC), and attack success impact (ASI). Fig. <ref type="figure">2</ref>(a) demonstrates that S-PPO and S-DQN consistently outperform PPO and DQN, maintaining stable ASR despite &#119875; &#119860; variations, due to the DRL agent's dynamic learning and strategy adjustment. In Fig. <ref type="figure">2</ref>(b), S-PPO and S-DQN use a sub-action space to effectively enhance defense actions, unlike the full action space, which limits efficient exploration and leads to poorer defense decisions. Fig. <ref type="figure">2(c)</ref> shows optimal defense resource allocation by the agent, resulting in stable DC. Fig. <ref type="figure">2</ref>(d) confirms that sub-action schemes consistently surpass conventional methods.</p><p>Analysis of Performance of Autonomous Vehicle: Fig. <ref type="figure">3</ref> explores the impact of varying attack severities (&#119875; &#119860; , probability of launching an attack) on different schemes in terms of route completion (RC), Infraction Score (IS), and driving score (DS). As observed in Fig. <ref type="figure">3</ref>(a), the strategies S-PPO and S-DQN demonstrate superior performance in route completion (RC) compared to PPO and DQN. This advantage is attributed to the reduced complexity of the action space, which enables our proposed DRL-based schemes to more effectively complete mission routes than other baseline approaches.</p><p>Further analyses in Figs. <ref type="figure">3(b</ref>) and 3(c) reveal that S-PPO and S-DQN also excel in reducing the Infraction Score (IS) and enhancing the driving score (DS), outperforming PPO and DQN. The performance of these models is followed sequentially by Random, and No Defense, showcasing the effectiveness of the sophisticated control strategies employed by S-PPO and S-DQN in maintaining safe and efficient driving behaviors under varying attack conditions.</p><p>Empirical Training Time Analysis: Table <ref type="table">IV</ref> demonstrates that using a sub-action space significantly reduces training time compared to a full action space for both PPO and DQN algorithms. The sub-action space enhances efficiency and speeds up convergence by reducing computational complexity, thus providing timely results. We observe that DQN requires more training time than PPO. This extended duration is attributed to DQN's use of a large replay buffer, increasing the time spent sampling and utilizing experiences for training. While the larger buffer improves learning stability, it slows training. The analysis was performed on a system with a 1.4 GHz Quad-Core Intel</p><p>TABLE IV TRAINING TIME IN SECONDS DRL-based IRS Schemes Training time in seconds PPO 1197 S-PPO 778 DQN 23172 S-DQN 22624 TABLE V DESIGN PARAMETERS, THEIR MEANING, AND DEFAULT VALUES Par. Meaning Value &#120572;, &#120573; Weight for reward function 0.67/0.33 &#120574; Discount rate 0.9 &#119873; PPO/DQN network Size 256 &#119897; actor Learning rate for actor-network 0.00005 &#119897; critic Learning rate for critic network 0.0005 &#119897; DQN Learning rate 0.0001 &#120598; Exploration rate 1.0 &#120598; min Minimum exploration rate 0.1 &#120598; decay Epsilon decay 0.9 Core i5 CPU, 8 GB of RAM, and an Intel Iris Plus Graphics GPU with 1536 MB of memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. CONCLUSION &amp; FUTURE WORK</head><p>Modern vehicles use the CAN bus system for communication between ECUs, but it lacks security features, making it vulnerable to attacks like DoS, spoofing, and fuzzing. These attacks can disrupt ECU functionality, leading to serious issues such as vehicle malfunctions or crashes. In-vehicle networks typically lack authentication and authorization, as they were originally designed with the assumption that all devices were trustworthy, prioritizing functionality and efficiency over security.</p><p>To address these vulnerabilities, we proposed a DRL-based IRS that responds optimally to detected attacks, minimizing the attack's success and ensuring mission completion within deadlines. The proposed DRL-based approaches outperform baseline methods, reducing the ASR by up to 60%, improving MSR by up to 70%, and optimizing other security and performance metrics of the autonomous vehicle. Leveraging a sub-action space-based design introduced efficiency for tailored defense strategies. Additionally, employing uncertainty-aware DRL with entropy regularization has improved solution quality by fostering greater diversity in solutions.   </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech. Downloaded on October 03,2025 at 20:22:41 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
