<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Trust-Aware Motion Planning for Human-Robot Collaboration under Distribution Temporal Logic Specifications</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>05/13/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10510528</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Pian Yu</author><author>Shuyang Dong</author><author>Shili Sheng</author><author>Lu Feng</author><author>Marta Kwiatkowska</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic formulas that involve human trust. Since accurately observing human trust in robots is challenging, we adopt the widely used partially observable Markov decision process (POMDP) framework for modelling the interactions between humans and robots. To specify the desired behaviour, we propose to use syntactically co-safe linear distribution temporal logic (scLDTL), a logic that is defined over predicates of states as well as belief states of partially observable systems. The incorporation of belief predicates in scLDTL enhances its expressiveness while simultaneously introducing added complexity. This also presents a new challenge as the belief predicates must be evaluated over the continuous (infinite) belief space. To address this challenge, we present an algorithm for solving the optimal policy synthesis problem. First, we enhance the belief MDP (derived by reformulating the POMDP) with a probabilistic labelling function. Then a product belief MDP is constructed between the probabilistically labelled belief MDP and the automaton translation of the scLDTL formula. Finally, we show that the optimal policy can be obtained by leveraging existing point-based value iteration algorithms with essential modifications. Human subject experiments with 21 participants on a driving simulator demonstrate the effectiveness of the proposed approach.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Autonomous robots are rapidly evolving into an essential component of our society, to mention home assistive robots <ref type="bibr">[1]</ref> and automated vehicles (AV) <ref type="bibr">[2]</ref>. Despite significant advances made in automation in recent years, attaining full autonomy, which would enable robots to successfully deal with complicated and unpredictable events or situations, remains stubbornly out of reach. For instance, the AV industry has had to reset expectations as it shifts its focus from level 5 to 4 autonomy <ref type="bibr">[3]</ref>. In many applications where robots work with or alongside humans, it is customary to have robots that are operated or supervised by a human <ref type="bibr">[4]</ref>. Collaborative human-robot partnerships often hinge on the foundation of trust. Therefore, recognizing human trust and incorporating it into the decision-making process is essential for achieving the full potential of human-robot interactive systems. *This work was supported in part by the ERC ADG FUN2MODEL (Grant agreement ID: 834115), NSF grant CCF-1942836, and AFOSR grant FA9550-21-1-0164. 1 Pian Yu and Marta Kwiatkowska are with the Department of Computer Science, University of Oxford, Oxford, United Kingdom {pian.yu, marta.kwiatkowska}@cs.ox.ac.uk</p><p>The subject of trust has been actively studied in multiple contexts such as psychology <ref type="bibr">[5]</ref> and automation <ref type="bibr">[6]</ref>. It is a multifaceted concept that can be influenced by a large number of factors. In the context of human-robot collaboration (HRC), studies have shown that the level of human trust in robots evolves over their interaction, affected by factors such as the automation's reliability, predictability, and transparency <ref type="bibr">[7]</ref>. While earlier work has focused on studying the measurement <ref type="bibr">[8]</ref>, modelling <ref type="bibr">[9]</ref>, and calibration <ref type="bibr">[10]</ref> of human trust in robots, recent work has gravitated towards devising strategies that enable robots to proactively infer and influence the human collaborator's trust <ref type="bibr">[11]</ref>, <ref type="bibr">[12]</ref>.</p><p>Various methods exist for modeling the interaction between humans and robots. Among these, the game-theoretic approaches <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref> and the partially observable Markov decision process (POMDP) framework <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref> have been extensively explored. Since trust is not fully observable, in this work we adopt the POMDP, where human trust can be modelled as a hidden variable. While the POMDP formulation allows the robot to act according to its beliefs about the human collaborator's trust based on observations, finding solutions to POMDPs of a realistic size is computationally challenging and existing work often relies on approximation algorithms <ref type="bibr">[17]</ref>- <ref type="bibr">[19]</ref>. Due to the inherent complexity of solving a POMDP, prior work devoted to trust-based decision making for HRC often focused on relatively simple specifications (e.g., accumulated reward maximisation) <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref>. Moreover, in all these studies, trust was treated as an implicit factor that impacts the performance of collaboration. None of these works has considered trust as part of the specification, where explicit requirements can be imposed. Real-world case studies have shown that an inappropriate level of trust may result in the misuse or disuse of automation <ref type="bibr">[6]</ref>. Therefore, in practical scenarios, it might be advantageous to stipulate conditions such as "the trust level must not fall below a certain threshold" and "the trust level at a particular juncture must surpass a certain threshold".</p><p>Recently, there has been a growing interest in using Linear Temporal Logic (LTL) <ref type="bibr">[22]</ref> or its finite variant, syntactically co-safe LTL (scLTL) <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>, for specifying complex behaviours of partially observable systems. In <ref type="bibr">[25]</ref>, it was demonstrated that, for Gaussian linear-time invariant (LTI) POMDPs, a finite-state abstraction can be constructed. This abstraction allows for policy synthesis, which can then be refined to the original Gaussian LTI POMDPs. However, we note that this approach is not applicable to general POMDPs. In <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref>, policy synthesis for POMDPs under LTL specifications was investigated. It is worth noting that, in these studies, LTL was employed to define specifications over the state space of the POMDPs, rather than beliefs.</p><p>In this work, we investigate trust-aware motion planning for HRC with complex temporal logic specifications applied to both the state of the robot and the trust (belief) of human. The trust-based human-robot interaction is modelled by a trust POMDP and syntactically co-safe linear distribution temporal logic (scLDTL) <ref type="bibr">[28]</ref> is utilised to specify the desired behaviour of the system. In <ref type="bibr">[28]</ref>, scLDTL was introduced as an extension of scLTL to leverage the richness of information contained within belief states of partially observable systems. It was shown that scLDTL is capable of expressing properties involving uncertainty and likelihood that cannot be described by existing logic. Nevertheless, the increased complexity introduced by the inclusion of belief predicates in scLDTL, which must be evaluated over the continuous (infinite) belief space, renders verification and synthesis from scLDTL a more demanding task. In <ref type="bibr">[28]</ref>, a feasibility checking algorithm was proposed for POMDPs with scLDTL specifications. However, to the best of our knowledge, the more challenging synthesis problem remains unresolved. Our contributions are summarised as follows. (i) We demonstrate the suitability of scLDTL for specifying the desired behaviour of trust-aware HRC systems that involve requirements in the robot workspace as well as the trust (belief) space. (ii) We propose an efficient algorithm to solve the scLDTL optimal policy synthesis for trust POMDPs, which overcomes the aforementioned complexity of scLDTL specifications. (iii) We design and conduct human subject experiments with 21 participants on a driving simulator to evaluate the proposed approach, with encouraging results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. MOTIVATING EXAMPLE</head><p>We describe a route planning problem for AVs. A human is driving an AV in a town, whose map is shown in Fig. <ref type="figure">1(a)</ref>. Within the town, we consider that there are 3 types of typical incidents that may occur on the road: (1) a pedestrian crossing the road, (2) an obstacle (e.g., a broken bicycle) ahead of the lane, and (3) an oncoming truck in the neighbouring lane. For simplicity, here we assume that there is at most one incident at a time for each road segment.</p><p>Fig. <ref type="figure">1</ref>(b) shows a schematic view of the AV traveling from one location to another. Imagine the AV is approaching an incident on a road segment while in autopilot mode. For safety considerations, the driver might choose to take over control of the AV and switch to manual driving. The level of trust that the driver has in the AV's ability to handle various types of incidents can influence their takeover decision; a driver with lower trust is more inclined to do so. Furthermore, the driver's level of trust changes over time and depends on the takeover decision and the vehicle's ability to handle an incident. In our previous work <ref type="bibr">[21]</ref>, human subject experiments have shown that, by proactively inferring human trust and taking it into account during decision making, the AV can achieve higher cumulative rewards.</p><p>The research focus of <ref type="bibr">[21]</ref> was on optimal route planning (e.g., navigating from one location to the other) for AVs. In contrast, in this work we are interested in trust-aware HRC in a broader context. Our goal is to develop a trustaware motion planning approach for HRC systems, which is capable of completing complex tasks specified in temporal logic that involve requirements on human trust levels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PRELIMINARIES</head><p>Before formulating our problem, we provide preliminary background on POMDPs and scLDTL.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Partially observable Markov decision processes</head><p>This section introduces POMDPs, which are well suited to the modelling of HRC systems under investigation. The human internal states (e.g., trust), which are not fully observable to robots, can be modelled as hidden states in POMDPs. In order to accurately represent the interactions between humans and robots, modifications to the conventional definition of a POMDP <ref type="bibr">[29]</ref> are incorporated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 1 (POMDPs).</head><p>A POMDP is defined as a tuple M = (S, A, O, Z, T ), where S, A, and O are finite sets of states, actions, and observations, respectively, and</p><p>is the probabilistic observation function, which gives the probability of observing o after taking action a in state s, i.e., Z(s, a, o) = p(o|s, a); &#8226; T : S &#8677;A&#8677;O&#8677;S ! [0, 1] is the probabilistic transition function, which gives the probability that the state has value s 0 after taking action a and receiving observations o in state s, i.e., T (s, a, o, s 0 ) = p(s 0 |s, a, o). Firstly, we note that, in Definition 1, the probability of receiving observation o 2 O is determined by the previous state s (instead of the resulting state s 0 ) and the action a that was just taken. Secondly, the transition function T is dependent on the observations. For the purpose of this work, a reward function is redundant and has been omitted.</p><p>Since a POMDP state is partially observable, we rely on the concept of a belief state<ref type="foot">foot_1</ref> . Let B be the belief space of S. A POMDP policy &#8673; : B ! A maps a belief state b 2 B, which is a probability distribution over S, to a prescribed action a 2 A. Given a policy &#8673;, the control of the agent's actions is performed online. First, the agent takes an action a = &#8673;(b) according to the given policy &#8673; and the current belief is b. Second, after taking an action a and receiving an observation o, the agent updates its belief:</p><p>where</p><p>The process then repeats. An interesting property to note about the POMDP described in Definition 1 is that the belief update ( <ref type="formula">1</ref>) is linear<ref type="foot">foot_2</ref> . An execution &#8674; of a POMDP is a possibly infinite alternating sequence of belief states, actions, and observations, i.e.,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Syntactically Co-Safe Linear Distribution Temporal Logic</head><p>This section introduces scLDTL for concisely specifying the desired behaviour of the HRC systems. It will become clear later that scLDTL is capable of expressing requirements in both the robot workspace and the trust (belief) space.</p><p>scLDTL consists of two types of predicates: (i) state predicates &#9003; that are evaluated over the states and (ii) belief predicates &#181;, which are obtained after evaluation of a predicate function g &#181; : B ! R on the belief space B as:</p><p>An scLDTL formula is defined inductively according to the following syntax <ref type="bibr">[28]</ref>:</p><p>where &#9003; is a state predicate, &#181; is a belief predicate, &#172; (negation), ^(conjunction), and _ (disjunction) are logic connectives, and U (until), (next) and &#8963; (eventually) are temporal operators. We omit the scLDTL semantics due to page limit and refer the reader to <ref type="bibr">[28]</ref>.</p><p>Let AP be a set of state predicates and BP be a set of belief predicates. The satisfaction of an scLDTL formula ' over AP [ BP can be captured through a deterministic finite automaton (DFA) A = (Q, q 0 , 2 AP[BP , , Acc), where Q is a finite set of states, q 0 2 Q is the initial state, : Q &#8677; 2 AP[BP ! Q is the transition function, and Acc &#10003; Q is the set of accepting states. A finite run q = q 0 q 1 . . . q k of A is called accepting if q k 2 Acc. Next we define the notion of probabilistic satisfaction with respect to an execution &#8674; of a POMDP.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 2.</head><p>[scLDTL satisfaction with respect to a POMDP execution] Given an execution &#8674; = b 0 a 0 o 0 b 1 a 1 o 1 . . . of a POMDP M, the probability that the execution &#8674; satisfies the scLDTL formula ' is given by</p><p>For simplicity, it is denoted in shorthand as Pr M (' | &#8674;).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. POMDPS FOR HRC</head><p>In this work, we consider a human and a robot working collaboratively in the workspace X. The human (H) adopts a supervisory role and the robot (R) is charged with performing tasks. The human can intervene in the task execution due to, for instance, low trust.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. HRC modelling</head><p>Within the workspace X, one can identify a set of incidents I, i.e., a set of events that can affect human trust and/or takeover decision. The likelihood of observing an incident id 2 I is determined by the current state and the action of the robot. Denote by &#8677; the state space of human trust in the robot, which is not fully observable by the robot.</p><p>The human-robot interaction can be modelled as a POMDP M as per Definition 1, where the state space S of M is factored as the observable state space X and the nonobservable state space &#8677;, i.e., S = X &#8677; &#8677;. Accordingly, the probabilistic transition function T is factored as the world state and the human trust probabilistic transition functions T X and T &#8677; , respectively. It has been shown in <ref type="bibr">[20]</ref> that the human trust affects human behaviour (e.g., takeover decision) and the human trust is affected by factors such as the robot performance (i.e., success/fail in handling an incident). Therefore, the state evolution of the trust POMDP M is determined not only by the robot action a r , but also posterior observations, including the incident id encountered during execution, the human takeover decision a h , and the robot performance e r . Formally, the trust POMDP is specified as a tuple M = (X, &#8677;, A r , O, Z, T X , T &#8677; ), where &#8226; A r is the finite action space of the robot;</p><p>&#8226; O = I &#8677; A h &#8677; E r is the observation set, where -I is the set of incidents; -A h = {tk, st} is the action space of the human, where tk and st stand for "takeover" and "standstill", respectively; similarly to <ref type="bibr">[20]</ref>, we assume that the human first observes the robot's action a r and then decides his or her own action a h ; -E r = {succ, fail} represents the performance of the robot, where succ and fail stand for "success" and "failure", respectively. Example (continued). The town map shown in Fig. <ref type="figure">1</ref>(a) has 12 road intersections {A, &#8226; &#8226; &#8226; , L}. Depending on the driving direction, each intersection can be factored into 3 different states (for instance, intersection A contains states EA, BA, and FA). We use Muir's questionnaire <ref type="bibr">[30]</ref> with a 7-point Likert scale as a human trust metric (i.e., trust ranges from 1 to 7). Therefore, one has that X = {EA, BA, FA &#8226; &#8226; &#8226; , EL, KL, HL}, &#8677; = {1, &#8226; &#8226; &#8226; , 7}, and I = {'pedestrian 0 , 'obstacle 0 , 'truck 0 }. The robot action is route choices and one can define an indicator function I for incidents. For instance, I(EA, AB, 'pedestrian 0 ) = 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Problem formulation</head><p>In this work, we consider that the HRC system is required to complete complex tasks in the robot workspace X. Moreover, there are requirements on the human trust &#8677; and/or trust belief b &#8677; . We formulate the tasks in the workspace as well as the requirements on human trust and trust belief in the form of an scLDTL formula '.</p><p>Example (continued). Consider now that the AV is required to complete the following tasks: (i) visits the target intersections G, J and L (in this order) from the initial location EA (see Fig. <ref type="figure">1(a)</ref>), (ii) the human trust level cannot be too low (lower than 2) at all times, and (iii) when the vehicle reaches the final intersection L, the likelihood that the human trust level is high (higher than or equal to 6) is no less than 0.5. In scLDTL, this specification can be written as</p><p>where there are 3 state predicates &#9003; 1 = {BG, FG, IG}, &#9003; 2 = {IJ, CJ, KJ}, &#9003; 3 = {KL, EL, HL} and 2 belief predicates LOWTRUST, HIGHTRUST. The predicate functions are given by g LOWTRUST = 1 A 1 b &#8677; , g HIGHTRUST = 0.5 A 2 b &#8677; , where A 1 = [1, 0, 0, 0, 0, 0, 0] encodes trust lower than 2 and A 2 = [0, 0, 0, 0, 0, 1, 1] encodes trust higher than or equal to 6.</p><p>Given the initial belief state b 0 and a policy &#8673;, denote by &#8674; &#8673; (b 0 ) the set of all possible executions generated by &#8673;. We consider the optimal policy synthesis problem for HRC under scLDTL specifications, i.e., find an optimal policy &#8673; such that the probability of the set of all executions that satisfy an scLDTL formula ' under &#8673; is maximised. Mathematically, this problem can be formulated as follows.</p><p>Problem 1. Given the trust POMDP M and the scLDTL specification ', find a policy &#8673; 2 &#8679; such that</p><p>where &#8679; is the set of all policies for the trust POMDP M and Pr M (' | &#8674;) is given in Definition 2.</p><p>Remark 1. Point-based value iteration (PBVI) algorithm has been proposed for POMDPs under LTL specifications.</p><p>In <ref type="bibr">[27]</ref>, the atomic propositions of an LTL formula are evaluated on the state space of the POMDP, which is finite. Therefore, the construction of the product POMDP and the computation of maximal end components are similar to finite MDPs, for which the existing graph-based methods <ref type="bibr">[22]</ref> can be utilised. In this work we consider scLDTL specifications, in which the belief predicates are evaluated over the belief space B of the trust POMDP M, which is infinite. Therefore, the approach proposed in <ref type="bibr">[27]</ref> is not applicable here.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. PROPOSED APPROACH</head><p>This section presents our approach to solve the optimal policy synthesis problem (Problem 1), which falls outside the purview of existing policy synthesis algorithms designed for POMDPs, e.g., <ref type="bibr">[25]</ref>- <ref type="bibr">[27]</ref>. It is divided into two parts: (1) the construction of the product POMDP with the DFA of the scLDTL formula ' and (2) an algorithm to approximately compute a policy that maximises the probability of satisfying the given scLDTL formula '.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Product POMDP</head><p>To begin with, we define the deterministic state and belief predicate labelling functions L s and L b as</p><p>which contains the set of state predicates that can be true at state (x, &#10003;);</p><p>which contains the set of belief predicates that can be true at belief state (x, b &#8677; ). Then the corresponding values are given by p Ls :</p><p>Now we propose to reformulate the trust POMDP M (equivalently) as a belief MDP and further expand it by including probabilistic labels, which yields a probabilistically labelled belief MDP M = (B, A r , O, &#7824;, TB , L, p L), where B is the belief space, A r , O are given in M, and </p><p>The product belief MDP M &#8677; is constructed between the probabilistically labelled belief MDP M and the DFA A = (Q, q 0 , 2 AP[BP , , Acc) of the scLDTL formula '. Definition 3 (Product belief MDP). Denote by</p><p>), where</p><p>where &#7824;, TB , and p L are defined in M.</p><p>Let &#8679; &#8677; be the set of all policies for the product belief MDP M &#8677; . The set of accepting states of M &#8677; is given by Acc &#8677; . We have the following result. Theorem 1. Given the trust POMDP M and the scLDTL formula ', the maximal probability of satisfying ' is:</p><p>Theorem 1 shows that, with the set of accepting states Acc &#8677; , the original optimal policy synthesis problem reduces to a reachability problem.</p><p>B. Optimal policy synthesis PBVI algorithms have been widely used for solving POMDP synthesis problems <ref type="bibr">[17]</ref>- <ref type="bibr">[19]</ref>. They often offer convergence guarantees specified as upper and lower bounds on the value function. However, these PBVI algorithms are not directly applicable for solving our problem (Problem 1). This is because solving a reachability problem for POMDPs necessitates the presence of a clearly defined reward function, which assigns value 1 to states in the goal set and 0 otherwise. However, in our case, capturing the satisfaction of an scLDTL specification through a state-based reward function is not feasible due to the presence of belief predicates.</p><p>In the following, we show that, with essential modifications, the existing PBVI algorithms can be leveraged for solving Problem 1 with the set of accepting states Acc &#8677; . Given a state s of the product belief MDP M &#8677; , we first define a value function V : S &#8677; ! R 0 as</p><p>which represents the maximal probability of reaching Acc &#8677; from initial state s. Then one can get that V (s) = 1, 8s 2</p><p>Acc &#8677; . For s / 2 Acc &#8677; , we further define the dynamic programming operator T as</p><p>Before running a PBVI algorithm, first we initialize the upper-and lower-bounds of the value function V as follows:</p><p>Then a precision parameter &#8999; is provided that controls the tightness of the convergence (for example, by controlling the depth of the tree in SARSOP <ref type="bibr">[19]</ref>), which yields |V (s 0 ) V (s 0 )| &#63743; &#8999;, where s 0 = (b 0 , q 0 ) is the initial state of the product belief MDP M &#8677; .</p><p>Denote by Pr max M (') := max &#8673;2&#8679; {Pr &#8673; M (')} the maximal probability of satisfying the scLDTL formula '. We have the following result.</p><p>Theorem 2. Let V (s 0 ) and V (s 0 ) be the upper-and lowerbounds of V (s 0 ) obtained using PBVI with the initialization function <ref type="bibr">(4)</ref>. One has that</p><p>Finally, the optimal policy &#8673;&#8676; for state s 2 S &#8677; can be derived using the value function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. IMPLEMENTATION</head><p>We have implemented the proposed approach to obtain the optimal policy for each scLDTL specification under consideration for the motivating example. To construct the trust POMDP, we utilise the trust dynamics model and the human takeover decision model, which were derived through an online user study involving 100 anonymous participants on Amazon Mechanical Turk (AMT) platform <ref type="bibr">[21]</ref>. Then two scLDTL specifications ' The DFA for each scLDTL specification ' i , i 2 {1, 2} is derived using <ref type="bibr">[31]</ref>. Then the corresponding product belief MDP M &#8677; i is constructed with the DFA of ' i . Finally, the upper-and lower-bounds of the value function V i are computed using the POMDP toolkit "pomdp py" <ref type="bibr">[32]</ref> (with essential modifications describled in Section V.B). The precision parameter is set as &#8999; = 0.01. All simulations are carried out on a Macbook Pro (2.6 GHz 6-Core Intel Core i7 and 16 GB of RAM).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. DRIVING SIMULATOR STUDY</head><p>We evaluate the effectiveness of obtained policies via a driving simulator study. <ref type="foot">3</ref>Study design. The study was conducted in a fixed-based driving simulator from SimXperience, consisting of a 55inch display, a racing car seat, a Logitech G29 steering wheel, and sport pedals, see Fig. <ref type="figure">3</ref>. In our study, four buttons on the steering wheel were programmed to let the drivers increase/decrease trust, switch driving mode between manual and autopilot, switch gear between drive and reverse.</p><p>The experiments were run on a machine with 3.5GHz CPU, NVIDIA GeForce RTX 3080 Ti, 62GB memory, and Ubuntu 20.04.6 LTS operating system. The virtual driving environment was created using CARLA 0.9.13. An autopilot controller was programmed for driving tasks such as lane keeping, taking turns at intersections, and handling incidents.</p><p>We recruited 21 undergraduate students from the university community to participant. All participants had a valid driver's license and regular or corrected-to-normal vision. Each participant was compensated with a $10 gift card. We adopted a within-subject study design: each participant took 4 unique routes, i.e., trust-aware and trust-free routes for both scLDTL specifications. The start and destination is either from A to L (' 1 ) or from H to E (' 2 ). The route is either the trust-aware route (obtained using the computed optimal policy) or the trust-free route (which is the shortestdistance route from the start to the destination). However, if the vehicle has not reached the destination after travelling 20 intersections, it reschedules a shortest-distance route. The order of trials is randomized and counter-balanced. Study Procedure. Upon arrival, a participant was instructed to read and sign a consent form approved by the Institutional Review Board. We conducted a five-minute training session to familiarize the participant with the driving simulator setup.</p><p>The vehicle started driving in autopilot mode. When the vehicle approached an incident, the participants can decide whether to take over the vehicle to handle the incidents on the road. Should the participant choose not to take over, the vehicle will remain in autopilot mode to handle the incident. At any point during the experiment, the participant has the option to assume control of the vehicle and switch to manual driving. Should the participant choose to take over, he/she was required to switch back to autopilot mode before arriving at the next intersection so that the vehicle can choose the next direction to go. We asked the participants to periodically record their trust in the AV using the buttons ' 2 respectively, which validates the effectiveness of the proposed approach. We further compare the trust-aware and trust-free routes for all drivers and both trails. The percentage of scLDTL satisfaction is 0.85 (trust-aware) vs 0.775 (trust-free). The average human trust is 4.6018 (trust-aware) vs 4.3933 (trust-free). One can see that the trust-aware policy outperforms the trust-free one. A video demonstration of the human experiment can be found at: <ref type="url">https://www.youtube.com/watch?v=pY0PkxYbQXo</ref>.</p><p>We summarise three key observations gained from the experiments. First, observing the AV successfully handle the same incidents multiple times does not necessarily guarantee an increase in human trust. Second, if a human's trust remains consistently low for an extended period, it becomes challenging for them to re-establish trust in the AV. Third, having the capability to effectively address an incident beforehand can contribute to boosting human trust. Based on the feedback received after the experiment, participants have indicated that, had they noticed the car braking earlier in situations involving pedestrians, they might have considered the AV more trustworthy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. CONCLUSIONS</head><p>In this work, we presented a trust-aware motion planning approach for HRC. We demonstrated the suitability of scLDTL for describing the desired behaviours of HRC systems and an algorithm was proposed for solving the optimal policy synthesis problem. Human subject experiments were conducted on a driving simulator, validating the effectiveness of the proposed approach and providing valuable new insights. Additionally, we observed variations in trust dynamics among individuals, which will be further investigated in future research.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>Shuyang Dong, Shili Sheng, and Lu Feng are with School of Engineering, University of Virginia, United States {sd3mn, ss7dr, lu.feng}@virginia.edu</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>A belief state is a probability distribution over all possible states in the POMDP. It represents the agent's subjective probability distribution of being in each state given its past observations and actions.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>The belief update of a conventional POMDP is often represented using the Bayes' filter.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>This study was approved by the University of Virginia Institutional Review Boards under IRB-SBS protocol #6045.</p></note>
		</body>
		</text>
</TEI>
