<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Towards Transparent Robotic Planning via Contrastive Explanations</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/24/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10216655</idno>
					<idno type="doi">10.1109/IROS45743.2020.9341773</idno>
					<title level='j'>2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Shenghui Chen</author><author>Kayla Boggess</author><author>Lu Feng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Providing explanations of chosen robotic actions can help to increase the transparency of robotic planning and improve users' trust. Social sciences suggest that the best explanations are contrastive, explaining not just why one action is taken, but why one action is taken instead of another. We formalize the notion of contrastive explanations for robotic planning policies based on Markov decision processes, drawing on insights from the social sciences. We present methods for the automated generation of contrastive explanations with three key factors: selectiveness, constrictiveness and responsibility. The results of a user study with 100 participants on the Amazon Mechanical Turk platform show that our generated contrastive explanations can help to increase users' understanding and trust of robotic planning policies, while reducing users' cognitive burden.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>In recent years, there has been a significant amount of work done in the field of Explainable AI <ref type="bibr">[1]</ref>, to increase the transparency of AI decision-making systems and improve users' trust. Because of traditional "blackbox" approaches, lay-users have little understanding of how a decision is made or why an action occurs, often leading to misunderstanding and mistrust of the system, which can further lead to problems caused by system misuse. The vast majority of work in Explainable AI has been focused on the building of simplified interpretable models as approximations of complex decisionmaking functions <ref type="bibr">[2]</ref>. However, few works consider social science theories of explanation. For example, Miller suggests humans prefer contrastive explanations, or explanations that revolve around counterfactuals <ref type="bibr">[3]</ref>. Specifically, humans tend to ask not why an event P happens, but why an event P happens instead of some event Q (where Q can be one single event or the collective of all possible events as long as they are in contrast to the event P). Understanding this contrast of events is more important to the human user than statements of probabilities or lists of total causes.</p><p>In this paper, we draw insights from the social sciences and formalize the notion of "contrastive explanations" in the context of robotic planning based on Markov decision processes (MDPs), which is a popular modeling formalism for representing abstract robotic mission plans <ref type="bibr">[4]</ref>. Our goal is to explain action choices in a planned robotic route, which can be computed as the optimal MDP policies using reinforcement learning <ref type="bibr">[5]</ref> or formal methods <ref type="bibr">[6]</ref>. More specifically, we focus on three key factors of contrastive explanations: selectiveness (e.g., choosing the most relevant events) <ref type="bibr">[3]</ref>, constrictiveness (e.g., numbering how many future possible actions that an action causes) <ref type="bibr">[3]</ref>, and responsibility (e.g., rating how important an action is in causing an event) <ref type="bibr">[7]</ref>. Different combinations of factors allow an explanation to control information specificity and support provided for events and actions. Motivating Example. Consider the route planning for a robot navigating in a grid map as shown in Figure <ref type="figure">1</ref>. There are three possible routes from the start (S) to the destination (D) highlighted in different colors. The robot may take different routes, depending on the trade-offs of different objectives (e.g., minimizing the total route distance to destination, minimizing the risk of colliding with pedestrians or cyclists). A naive way to explain a route is to generate a sentence for each action the robot takes at every state using a structured language template (e.g., "We move east at grid 10."), and then concatenate these sentences following the sequence of states in the route. However, it would be tedious if not infeasible to explain the robotic action in every state following the route, especially for large MDP models with hundreds of thousands of states. Therefore, we select a handful of critical states and only explain actions on those states. In addition to explain what action is taken in a state, we also explain why the action is taken by comparing it to alternative actions in terms of constrictiveness (e.g., "We move east at grid 10 because it leads to the most flexible future route.") and responsibility (e.g., "We move east at grid 10 because it leads to the shortest route."). Contributions. We summarize the major contributions of this paper as follows:</p><p>1) A formalization of contrastive explanations for MDPs based on three key factors (selectiveness, constrictiveness, and responsibility). 2) A prototype implementation to automatically generate contrastive explanations of MDP policies. 3) A user study with 100 participants to investigate the user understanding, trust and preference of contrastive explanations. Related Work. When applied to AI-based systems, the finding of counterfactuals is often treated as a search or optimization problem <ref type="bibr">[2]</ref>. However, a counterfactual must be relevant to the system context or it will not produce an explanation that is understandable for the user. Additionally, counterfactuals can be isolated through the use of modeling by providing concise descriptions of system behavior <ref type="bibr">[8]</ref>- <ref type="bibr">[11]</ref>. Furthermore, explanation creation and policy transparency can be based in finding critical states, or the most important states, when reduction of the explanation is necessary <ref type="bibr">[12]</ref>.</p><p>The explanations provided by an AI-based decisionmaking system must deal with the significant trade-off between what the system is trying to accomplish and what the users need to understand the decisions made fully <ref type="bibr">[13]</ref>. Balancing these trade-off increases system interpretability and user accessibility <ref type="bibr">[14]</ref>. So, when creating an explanation, all possible explanatory factors and support must be chosen carefully to maximize explaninee understanding and minimize explainee burden.</p><p>Explicitly, the generation of explanations for robotic planning through structured language templates has been done in work such as <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>. However, none of the previous works have produced contrastive explanations using selectiveness through the identification of critical states, responsibility, and constrictiveness, even though social science points to these factors as valid ways to increase explanation effectiveness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PRELIMINARIES</head><p>In this section, we provide the necessary background on Markov decision processes (MDPs), which have been popularly used as a modeling formalism in robotic planning <ref type="bibr">[4]</ref>. Formally, an MDP model is a tuple M = (S, s 0 , A, &#948;, r), where S is a finite set of states, s 0 &#8712; S is an initial state, A is a set of actions, &#948; : S &#215; A &#215; S &#8594; [0, 1] is a transition relation mapping each state-action pair to a probability distribution over S, and r : S &#215; A &#215; S &#8594; R is a reward function. At each MDP state s, first an action a &#8712; A is chosen nondeterministically based on an MDP policy &#963; : S &#8594; A, then a success state s is chosen with the probability &#948;(s, a, s ).</p><p>Given an MDP model for robotic planning, there are many different methods for computing an optimal policy. For example, various reinforcement learning techniques <ref type="bibr">[5]</ref> can compute an optimal MDP policy with the goal of maximizing the cumulative reward. In recent years, there are also increasing interests in applying formal methods to synthesize robotic plans subject to a rich set of MDP properties (e.g., probabilistic reachability, safety properties, liveness properties) expressed in temporal logic specifications <ref type="bibr">[6]</ref>. Our approach is generally applicable for explaining any MDP policy, and is orthogonal to whether the policy is computed by reinforcement learning or formal methods.</p><p>Example 1: We build an MDP model based on the grid map shown in Figure <ref type="figure">1</ref>. The state space S is defined by the grids. There are 25 states in total. The initial state is grid 5 which is labeled with S. There are four actions in A: move north, move east, move south, and move west. We assume that, due to sensor uncertainty, the robot would perform an intended action correctly with probability 0.9 and get stuck in the same grid with probability 0.1. An example transition relation is &#948;(g 5 , south, g 10 ) = 0.9 and &#948;(g 5 , south, g 5 ) = 0.1. We define a reward function r for counting the total distance (e.g., number of grids) traveled for the robot to reach the destination. For example, r(g 5 , south, g 10 ) = 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. CONTRASTIVE EXPLANATIONS</head><p>We formalize the three key factors of contrastive explanations: selectiveness, constrictiveness and responsibility, and present methods to compute them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Selectiveness</head><p>It would be tedious or even infeasible to explain a robot's action at every single state along the planned route, especially for large MDP models that may contain hundreds of thousands of states. Indeed, according to <ref type="bibr">[2]</ref>, explanations should be selective to reduce long causal chains to a cognitively manageable size for humans. To this end, we define the notion of critical states in an MDP model and only explain actions in those states. Intuitively, a critical state is where the choice of actions would greatly affect the MDP policies and their performance. For example, in grid 14 of Figure <ref type="figure">1</ref>, moving north is more likely to reach the destination while moving west would reach a dead end. Given a pair of state s and action a in an MDP model, we define the impact of this state-action pair as &#969;(s, a) = s &#8712;S &#948;(s, a, s ) &#8226; &#961; s where &#948;(s, a, s ) is the transition probability and &#961; s is the MDP property value (e.g., maximum cumulative reward) at a success state s . Then, we can obtain a pair of values for each state s to measure the best/worst impact of different enabled actions A(s): &#955; max s = max a&#8712;A(s) &#969;(s, a), &#955; min s = min a&#8712;A(s) &#969;(s, a). Formally, we define the set of critical states of an MDP model with the state space S as</p><p>where &#945; is a user-defined threshold. The higher the value of &#945;, the fewer critical states would be returned.</p><p>Example 2: Following the MDP model defined in Example 1 and considering a threshold &#945; = 0 for the total distance of reaching the destination, we can compute the set of critical states as {g 5 , g 7 , g 10 , g 12 , g 14 }. We use grid 10 as an example to show the computation procedure. There are two enabled actions in grid 10: move east or move south. Assume that &#961; g10 = 6.666, &#961; g11 = 5.555, &#961; g15 = 9.999, which represent the total expected distance of starting from grid 10, grid 11, and grid 15 to reach the destination, respectively. We have &#969;(g 10 , east) = 0.9 &#215; 5.555 + 0.1 &#215; 6.666 = 5.667 &#969;(g 10 , south) = 0.9 &#215; 9.999 + 0.1 &#215; 6.666 = 9.667 &#955; max g10 -&#955; min g10 = 9.667 -5.667 = 4 &gt; 0 Thus, grid 10 is a critical state.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Constrictiveness</head><p>In social sciences, a decision is said to be more "constrictive" if choosing it causes less possible future decisions. Constrictive actions are thus less mutable increasing their importance in causing an event's outcome [?]. Over time, actions tend to become more constrictive as a goal is reached. In this paper, we interpret constrictiveness as a measurement of how much an action would affect the flexibility in terms of the number of critical decision points left in the future route. Intuitively, more decision points lead to more flexibility for the robot to reroute, hence it is considered less constrictive and is preferred as time passes. Given a state s in an MDP model, we can construct a expectimax-like search tree <ref type="bibr">[17]</ref> by taking the state s as the root node, spanning with edges labeled with action a leading to a set of children nodes s if the transition probability &#948;(s, a, s ) &gt; 0 until reaching target destination states. We define the constrictiveness value of choosing an action a in an MDP state s as the number of critical decision points left in possible future routes by traversing the search tree T (s, a). Formally,</p><p>Example 3: Figure <ref type="figure">2</ref> shows two example search trees T (g 10 , south) and T (g 10 , east). There is only one critical state g 14 in the tree T (g 10 , south), with two enabled actions; thus &#949;(g 10 , south) = 2. And for the tree T (g 10 , east), there are four future critical state-action pairs highlighted in red, that is &#949;(g 10 , east) = 4. This suggests moving east is more flexible with more critical decision points in future routes (i.e., less constrictive) than moving south, and thus is preferred.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Responsibility</head><p>In social sciences, an action is said to be more "responsible" if it changes the outcome more by removing that action from the current chosen path <ref type="bibr">[?]</ref>. Humans tend to be more interested in actions that hold a higher responsibility as it measures how much influence an action has over the final outcome <ref type="bibr">[2]</ref>. In this paper, we interpret responsibility as the measurement of an action's relative impact on the MDP property value compared with other actions enabled in the same state. Formally, we define the responsibility value of an action a in an MDP state s as Example 4: Following the previous Example 2, we know that &#969;(g 10 , south) = 9.667, &#969;(g 10 , east) = 5.667, and &#955; min g10 = 5.667. We can compute the responsibility value &#950;(g 10 , south) = 9.667 -5.667 = 4 and &#950;(g 10 , east) = 5.667 -5.667 = 0. Thus, moving south is more responsible to the total distance, comparing with moving east. Since we would prefer shorter route, moving east would be more preferable at grid 10.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. USER STUDY DESIGN</head><p>Experiment Domain. We designed a user study to evaluate the effectiveness of selectiveness, constrictiveness, and responsibility in contrastive explanations. For this study, we recruited 100 individuals with a categorical age distribution of 3 (0-17); 12 (18-24); 57 (25-34); 20 (35-49); 0 (50-64); and 2 (65+) using Amazon Mechanical Turk. We asked them to evaluate different types of explanations. Users were presented with the same 3 10-by-10 grid maps, each containing an optimal route from a start state to a finish state. All chosen routes were approximately equivalent and worked to minimize the distance from start to finish. The possible positions and transitions, other than route chosen which was highlighted in blue, were displayed in grey to indicate to the user that all positions and transitions on the map were equivalent in value as well. Each route was presented with 7 different explanations about the robotic actions taken within it. An example of each of these explanations can be seen in Table <ref type="table">I</ref>. Independent Variables. Each explanation was evaluated by the user on the level that the user understood the information presented by the explanation and the level that the user trusted that the information was correct. Users were also asked to choose the explanation that they preferred out of several different groupings of explanations. Our independent variables included explanation type and explanation factors. Dependent Measures. The main subjective dependent variables were user understanding, user trust, and user preference. User understanding was measured using a 5-point Likert scale with a value of 1 indicating that the user did not understand the explanation at all and 5 indicating that the user fully understood the explanation. User trust was also measured on a 5-point Likert scale with a value of 1 indicating that the user did not trust the information in the explanation was correct and 5 indicating that they trusted the explanation was fully correct. We also measured time spent accessing the explanation as an objective dependent variable as well. We begin the timer as soon as the user accessed the page and ended the timer when all questions about the route and explanation had been answered. Hypothesis. We have the following three hypotheses for this user study.</p><p>H1. We hypothesize that the use of selectiveness, responsibility, and constrictiveness in contrastive explanations will increase user understanding of information.</p><p>H2. We hypothesize that the use of selectiveness, responsibility, and constrictivenss in contrastive explanations will increase user trust in explanation correctness.</p><p>H3. We hypothesize that users will prefer contrastive explanations using selectiveness, responsibility, and constrictiveness over other types of naive explanations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. RESULTS</head><p>In the following, we discuss the results of our user study regarding three hypotheses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Regarding H1 about user understanding</head><p>We begin by analyzing user understanding shown in Figure <ref type="figure">3</ref>. By a One-Way ANOVA (&#945; = 0.05) test, F(6,2093) = 50.39, p &#8804; 0.00001, the statistical differences between data shown is significant. As expected, presenting a user with no explanation allows for little understanding of the presented information. However, the introduction of responsibility or constrictive justification in the explanation increases user understanding of why actions are taken. Thus, users find it easier to understand an explanation if the justification for an action is presented alongside the action instead of just presenting the action. Fig. <ref type="figure">3</ref>: Results of average user understanding of system reasoning when presented with each explanation type. Users understood system actions better when presented with responsibility and constrictive based explanations compared to their naive counterparts.</p><p>When dealing with selectiveness of an explanation, things are not as straight forward. This survey found that user understanding is decreased as the number of states explained was decreased to only the most critical states. Thus, a naive explanation is more effective in creating an overall understanding of the map than a selective one. However, we can also define user understanding in terms of cognitive burden, or the amount of time or energy the user must expend on processing the explanation. This factor is especially important in applications that are time-sensitive, such as autonomous vehicles, where We move east at grid 10.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Responsibility Explanation</head><p>We move east at grid 10 because it leads to the shortest route.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Constrictive Explanation</head><p>We move east at grid 10 because it leads to the most flexible future route. Naive Explanation (Entire Path) First, we move south at grid 5. Next, we move east at grid 10. Then, we move east at grid 11. Next, we move north at grid 12. Then, we move east at grid 7. Next, we move north at grid 8. Finally, we move east at grid 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Selective Explanation</head><p>First, we move south at critical grid 5. Then, we move east at critical grid 10. Next, we move north at critical grid 12. Finally, we move east at critical grid 7. All other decisions result in equivalent routes. Contrastive Explanation (All Factors) First, we move south instead of other directions at critical grid 5 because it leads to the shortest and most flexible future route. Then, we move east at critical grid 10 instead of another direction because it leads to the shortest and most flexible future route. Next, we move north at critical grid 12 instead of other directions because it leads to the shortest route. Finally, we move east at critical grid 7 instead of other directions because it leads to the shortest route. All other decisions result in equivalent routes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TABLE I:</head><p>Example of different explanations presented to users based on the grid map in Figure <ref type="figure">1</ref> the user has a short time to process the explanation and make a critical decision. Figure <ref type="figure">4</ref> shows the average time that users spent answering the survey questions regarding each explanation. An One-Way ANOVA test (&#945; = 0.05), F(6,2093) = 2.9967, p = 0.0064, proves the statistical difference between the data for average time spent is significant. The use of a selective explanation decreases the amount of time that the user needs to understand the information given over its naive counterpart. The high standard deviation of the naive explanation shows that this is especially true for some users. So, selective explanations may impart less information, but they also decrease the amount of time needed to process that information. This may not be an important factor in a small example such as a 10-by-10 route map, but as the number of states grows the importance of explanation selectiveness may increase as well, especially in models containing millions of possible states.</p><p>A contrastive explanation combining all three factors does not increase user understanding as we hypothesized in H1. This may be due in part to the selective factor that we discussed above. However, it did greatly decrease the amount of time needed for users to process the information over the naive explanation. Thus, contrastive explanation could be effective in increasing user understanding and decreasing cognitive burden in users.</p><p>In summary, the use of responsibility and constric-tiveness increase understanding, while selectiveness decreases user understanding. Overall, contrastive explanations increase understanding and decrease cognitive burden.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Regarding H2 about user trust</head><p>Users not only need to understand the explanations presented, but they also need to trust that these explanation are correct. This can be achieve by providing relevant and necessary justification. Figure <ref type="figure">5</ref> shows the average user trust in explanation correctness compared to the explanation type. By One-Way ANOVA (&#945; = 0.05), F(6,2093) = 211.60, p &#8804; 0.00001, the statistical difference between explanation trust averages is significant.</p><p>Providing no explanation to the user gives little trust to the system. However, using an explanation that provides support through action responsibility significantly increases user trust over a naive explanation. Yet, a constrictive explanation provides a significantly smaller increase in user trust than its responsibility counterpart, nearly offering the same amount of trust as the naive explanation. This may be due to the less direct connection of the constrictiveness justification compared to responsibility in reaching the destination state. It seems that selectiveness gives little help in increasing user trust as well. Even presented with the fact that only the states presented are important to reaching the established goal, users trust explanations that present more information about route actions. This may be because the naive explanation appears to have more information and thus more support even though this is not correct.</p><p>Additionally, putting all three factors together into a larger contrastive explanation seems to decrease the effectiveness of explanation in gaining user trust. This is most likely due to the integration of the selectiveness factor and the decrease in the number of states explained. However, the use of the responsibility and constrictive justification does help to establish more trust in a larger contrastive explanation bring the overall trust in the larger contrastive explanation up on average compared to a naive explanation. The use of responsibility increases user trust, while the use of selectiveness decreases this factor. Constrictiveness has no effect. Overall, contrastive explanations increase user trust using responsibility justification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Regarding H3 about user preference</head><p>Users not only need to understand and trust explanations, but they also must "like" them as they are a social entity subject to human preference. Someone preferring an explanation may make it more effective, as it may meet their needs of explanation justification and length better than other explanations. We found that users prefer responsibility explanations (46%) and constrictive explanations (48%) more often than their naive explanation counterparts (38%, 32% respectively). This is most likely due to the fact that people prefer explanations with some type of contrastive justification than just the presentation of actions. However, when dealing with selective explanations (26.3%), users prefer a naive explanation (57.7%) which presents each state and its action instead on one present the actions performed at critical states. Thus, users prefer to see more information, even if that information is not necessarily critical. Furthermore, we found that users prefer explanations using only responsibility (20.7%), constrictive (27%), or selective (23.7%) factors almost just as often as contrastive explanations (28.7%) combining all three factors. This in part may be due to user aversion to explanations utilizing selectiveness, but it also might point to large range of user preference that needs to be addressed in the creation of personalized explanations.</p><p>To summarize, users prefer responsibility and constrictiveness explanations over naive explanations, but do not prefer selective explanations. Users prefer contrastive explanations at the same rate as single factor explanations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION AND FUTURE WORK</head><p>In this paper, we present methods to compute contrastive explanations with three key factors (selective-ness, constrictiveness and responsibility) for robotic planning based on Markov decision processes, drawing on insights from the social sciences. A user study with 100 participants on the Amazon Mechanical Turk platform shows that our generated contrastive explanations can improve user understanding and trust of autonomy, while reducing cognitive burden. In the future, we plan to further investigate methods of adapting explanations to an individual user's preferences and updating explanations in real-time based on user feedback.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: University of Virginia Libraries. Downloaded on May 26,2021 at 17:32:08 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
