<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Explain it as simple as possible, but no simpler – Explanation via model simplification for addressing inferential gap</title></titleStmt>
			<publicationStmt>
				<publisher>Elsevier AIJ</publisher>
				<date>03/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10630721</idno>
					<idno type="doi">10.1016/j.artint.2024.104279</idno>
					<title level='j'>Artificial Intelligence</title>
<idno>0004-3702</idno>
<biblScope unit="volume">340</biblScope>
<biblScope unit="issue">C</biblScope>					

					<author>Sarath Sreedharan</author><author>Siddharth Srivastava</author><author>Subbarao Kambhampati</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[One of the core challenges of explaining decisions made by modern AI systems is the need to address the potential gap in the inferential capabilities of the system generating the decision and the user trying to make sense of it. This inferential capability gap becomes even more critical when it comes to explaining sequential decisions. While there have been some isolated e!orts at developing explanation methods suited for complex decision-making settings, most of these current e!orts are limited in scope. In this paper, we introduce a general framework for generating explanations in the presence of inferential capability gaps. A framework that is grounded in the generation of simplified representations of the agent model through the application of a sequence of model simplifying transformations. This framework not only allows us to develop an extremely general explanation generation algorithm, but we see that many of the existing works in this direction could be seen as specific instantiations of our more general method. While the ideas presented in this paper are general enough to be applied to any decision-making framework, we will focus on instantiating the framework in the context of stochastic planning problems. As a part of this instantiation, we will also provide an exhaustive characterization of explanatory queries and an analysis of various classes of applicable transformations. We will evaluate →]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>While AI as a field has made rapid progress in recent years, we are still far from realizing the truly transformational potential of AI systems in our society.</p><p>Chief among the obstacles to achieving this goal is the inability of our current AI systems to work e!ectively with people from all walks of life. Meeting this challenge requires us to rethink our current paradigms for agent design. We need to look beyond just measuring the e!ectiveness of agents in terms of their capability to achieve their assigned goals, and consider their ability to communicate and explain their decisions to their end-users intuitively. Additionally, they should allow their users to critique their decisions and, as required, be able to incorporate user suggestions. While the creation of explainable agents capable of supporting such interactions has been receiving significant attention lately <ref type="bibr">[1,</ref><ref type="bibr">2]</ref>, we are still at the beginning stages of creating systematic approaches to explaining AI agent decisions and facilitating such interactive dialogues.</p><p>One of the core challenges of explaining any automated decisions, particularly in sequential decision-making problems, is the complexity of the underlying reasoning problems. Many modern planners and reinforcement learning (RL) agents can generate solutions to problems that most everyday users would find hard to comprehend completely, let alone solve. In other words, the problem of generating explanations in these scenarios are made more challenging by the presence of an inferential capability gap between the user and the decision-making system. Thankfully for most explanatory queries that a user may raise, we would hardly ever need to expose them to the full complexity of the original problem.</p><p>Instead, we can often get by using just a simplified representation that su"ces to answer their specific question. In this paper, we will operationalize this intuition to generate e!ective explanations by introducing an explanation generation framework that generates simplified representations of the original model that is su"cient to respond to a given user query. To ground this framework, we will look at the use of this explanatory framework in the context of a very general version of sequential decision-making problems, namely, goal-directed stochastic planning problems.  Example 1. In this example the user of the robot starts by asking the robot to bake a cake, to which it replies it can't bake a cake. Now, the user demands an explanation for its failure to prepare a cake. The robot may have come to the conclusion that it cannot prepare a cake from the fact that its stochastic task and motion planner failed to produce a valid plan. Dumping its search tree or the underlying model to the human would hardly su"ce as a satisfactory explanation.</p><p>Suppose the robot was to analyze the planning problem; it could find out that the problem remains unsolvable even if it were to ignore the stochasticity of the domain (related to various slippages the robot may incur in its operation) and all the motion constraints. The robot isn't even able to create cake batter, which is required for the final goal, because it doesn't have a whisk. The robot could ignore all low-level details and just surface a simplified model to the user containing information about the mixing action, which includes the fact that the action requires access to a whisk. Now, given this model, the robot can walk the human through a hypothetical trace where it tries to make the cake better and show that it would fail at the mixing step due to the missing whisk. Now, this hypothetical trace provides a simple example demonstrating the robot's inability to achieve its goals. We will refer to such secondary explanatory information as explanatory witness.</p><p>Throughout this paper, we will see how our framework can help generate this exact explanation. As for the rest of the paper, we will start the technical discussion by providing an overview of the framework (Section 3.1). We will provide a small summary for each framework component with pointers to relevant sections that provide a detailed discussion of each component. Section 3.2 will provide formal definitions of some of the central concepts we will be using in the paper, including inferential gap, explanatory criteria, and model simplifying transformations, and lay out the basic algorithm to compute explanatory models.</p><p>With the basic framework in place, we will delve deeper into the grounding of the framework in the context of stochastic planning problems and go over the characterization of each component of the framework in this setting. This would involve providing an exhaustive characterization of contrastive explanatory queries possible in stochastic planning settings (Section 4) and introducing a set of a model transformation (Section 5). Each model transformation class presents a conceptually consistent formalization of a large class of specific transformations that could be applied to a given planning problem to simplify the problem. For each transformation class, we will also point to some previous works in the wider explainable AI (XAI) literature that have applied similar techniques, thereby presenting a previous instance of a limited application of our more general framework. For each transformation class, we introduce one specific technique that only takes polynomial time to perform and analyze the properties of the specific technique for the class of stochastic planning problems we are interested in. Section 6 will introduce a more constrained version of the algorithm introduced in Section 3 specifically designed for the transformation classes introduced in Section 5 called the stratified search algorithm. Section 7 presents the various experiments we performed to evaluate the explanation generation method discussed here. In particular, Section 7.1 presents a user study on the e!ectiveness of our overall framework and measures the e!ectiveness of the individual transformations. In Section 7.2, we evaluate the e!ectiveness of the generated explanation using computational proxies for the complexity.</p><p>In section 8, we will further see how most current works aimed at addressing the inferential capability gap can be seen as a specific instance of our more general framework. In particular, we will discuss how each existing work can be seen as using some limited set of transformations or focuses primarily on providing an explanatory witness. All the frameworks and analyses will be grounded in the context of MDPs with the P assumption (i.e., Positive action cost assumption) <ref type="bibr">[3]</ref>, which generalizes over many commonly studied stochastic planning formalisms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>The decision-making model we will use throughout the paper is the undiscounted MDP with absorbing goal states that tries to generate optimal solutions under total expected cost criteria. Such models can be described by a tuple of the form M = &#8594;S, A, P, C, I, G&#8593;, where S is the state space, A the set of possible actions, P : S &#8595; A &#8595; S &#8596; [0, 1] the transition probabilities, C captures the cost of executing a given action in the state and it resulting in a new state, I &#8599; S the initial state and G &#8600; S, is the set of absorbing goal states. General MDP definitions also allow us to limit what actions are applicable in which states, but we will skip explicitly capturing that to simplify the notations. We will specifically limit ourselves to cases where all state-action-state tuples satisfy the condition C(s, a, s &#8593; ) &#8771; 0. MDPs that satisfy this condition are usually referred to as satisfying the P -assumption or the positive action cost assumption (cf. <ref type="bibr">[3]</ref>). While most of the results established in the paper also apply to other settings, say when all costs are guaranteed to be less than or equal to zero, we will mostly focus on the former as it is better suited to goal-directed problems.</p><p>We wanted to preserve goals in the problem formulation as there is plenty of evidence from psychological studies that show that people tend to think in terms of goals when performing planning <ref type="bibr">[4]</ref>. MDPs with P -assumptions are still an extremely general formulation and cover several formalisms studied frequently in AI literature, including non-negative cost infinite horizon discounted MDPs <ref type="bibr">[3]</ref>, non-negative cost SSPs <ref type="bibr">[3]</ref>, MAXPROB MDPs <ref type="bibr">[5]</ref>, NEG Mdps <ref type="bibr">[6]</ref>, non-negative cost GSSP's <ref type="bibr">[5]</ref>, SSPUDE's <ref type="bibr">[7]</ref> etc.</p><p>We will use the functions J : S &#8596; R and Q : S &#8595; A &#8596; R to capture the expected total cost and Q function and use J &#8594; and Q &#8594; to represent the optimal cost and Q function respectively. A policy is said to be optimal if J &#969; (s) = J &#8594; (s), where J &#969; is the cost function obtained by following the given policy. In the most general case, a policy takes the form &#969; = &#8594;&#181; 1 , &#181; 2 , ...&#8593;, where each &#181; i is a mapping from state to action for a timestep i. A policy is said to be stationary if the mapping does not depend on the time step. Under the P assumption, value iteration converges if the cost function is initialized with a bounded cost function less than the optimal cost function (say zero cost function), provided there exists a fixed point to the function. In addition to the optimal cost, a factor that we will be considering throughout the paper is the probability that the execution of policy from a given state would lead it to a goal state P &#969; (s) = g&#8595;G P (g|s, &#969;) and we will P &#8594; (s) to denote the highest possible probability of achieving the goal under any policy for the given model (which we will refer to as the MAXPROB policy) and use P &#969; (s) to capture probability under a given policy. In most cases, we will focus on the cost and probability of achieving the goal from the initial state and states reachable from the initial state. E!ectively, in this scenario, planning for MDPs becomes a multi-objective optimization problem, where the objectives become the cost of the solution and the reachability of goals. This may come across as surprising to readers who are most familiar with the more restricted classes of MDPs, where any potential goals are usually compiled into the cost function. Thus, one may anticipate that all goal reachability queries will be resolved through cost di!erences. More general variants of MDPs like iSSPUDE <ref type="bibr">[7]</ref> use the probability of getting to the goal as a separate optimization criterion for choosing the policy. So, instead of choosing a plan that directly optimizes the cost, they use the probability of getting to the goal as the primary criterion and the cost as the secondary one. This also means that in every case, there exists a strict ordering between the objectives, and we don't need to rely on additional considerations like establishing Pareto optimality or considering the two objectives simultaneously.</p><p>A given MDP could be represented in multiple ways, a particularly popular way one could represent such models is to describe them using problem description languages like PPDDL <ref type="bibr">[8]</ref>. Mathematically a model described in PPDDL is given by a tuple of the form</p><p>&#8226; F D is the set of propositions used to define the state space. Each state in the model will be a specific instantiation of these propositions. That is, each state s i can be uniquely represented by the set of facts true in that state, i.e., s i &#8600; F . These propositions are also sometimes referred to as state fluents and state factors in the literature (cf. <ref type="bibr">[9]</ref>).</p><p>&#8226; A D is the set of actions available in the model. Each action a i &#8599; A D is further defined as following a i = &#8594;prec i , E i &#8593; -prec i &#8600; F denotes the preconditions of the action. An action is only applicable in states where its preconditions are true, i.e., action</p><p>-E i is the set of mutually exclusive e!ects that the execution of the action can cause. Each individual e!ect e j i &#8599; E i , if further represented by the tuple &#8594;add j i , del j i , p j i , c j i &#8593;, where * add j i &#8600; F , called the add e!ects, is the set of facts that will be turned true by that e!ect in the resultant state. * del j i &#8600; F , called the delete e!ects, is the set of facts that will be turned false by that e!ect in the resultant state. * p j i , of the probability of occurrence of that particular e!ect (the distribution over the individual e!ects is expected to form a well formed probability distribution). * c j i is the cost associated with the transition</p><p>&#8226; I D initial state, we expect the system to start in that state.</p><p>&#8226; G D &#8600; F D is the goal specification, where any state s i such that G D &#8600; s i is considered to be an absorbing goal state (or destination state).</p><p>Each valid description of the above form is expected to translate into an MDP of the form M = &#8594;S, A, I, P, C, G&#8593;, where &#8226; S is the set of states of the model and corresponds to the state space defined by F (i.e |S| = 2 |F | ). For each i &#8599; S, we will use s i to denote the symbolic state (described by the set of propositions that are true in the state).</p><p>&#8226; A is the action/control space of the underlying MDP and is isomorphic to the set A and will use a mdp aj to represent the corresponding action for the symbolic action a j , moreover A(i) = {a mdp aj |prec j &#8600; s i } (i.e. the actions available at a state i).</p><p>&#8226; I is underlying atomic state corresponding to I D</p><p>&#8226; P is transition probability and is defined as follows</p><p>is 0. Note that given the assumption the e!ects are mutually exclusive, there exists at most one e!ect that can cause this transition, provided there are no redundant e!ects (i.e., ones that leave the state unchanged because of adding existing facts or trying to remove missing facts).</p><p>&#8226; G is the set of destination or goal nodes that correspond to the ones specified by G, i.e., for i &#8599; G, the corresponding symbolic state s i , will satisfy the condition G D &#8600; s i .</p><p>&#8226; C : S &#8595; A &#8595; S is the cost function. As with the transition probability, the cost function is given by c j i .</p><p>For a given description M D , we will use the notation M to refer to the MDP induced by it, especially if we don't explicitly mention the underlying MDP. Two facts that might be worth keeping in mind is (1) for any MDP with a finite set of actions and states can be captured by a description and (2) for a fixed fluent and action set, the model description for a given MDP is unique.</p><p>Such factored model descriptions are usually easier for people to understand and create, not just due to conciseness, but also because it is based on folk psychology principles regarding actions and as such are intuitive descriptions <ref type="bibr">[10]</ref>. We assume that models are provided in such descriptions, though one could always start with an atomic or inscrutable representation of the model and derive such models through di!erent learning methods <ref type="bibr">[11,</ref><ref type="bibr">12]</ref>.</p><p>3. Our Approach -The Model Simplification Framework for Contrastive Explanation Generation</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Approach Overview</head><p>In this section, we provide an abstract overview of the framework and by extension for the paper as a whole. We will touch on each component of the framework, discuss how the component comes into play in generating the explanation discussed in Section 1 and provide relevant pointers to parts of the paper that will cover the component in greater details.</p><p>Figure <ref type="figure">2</ref> presents an overview of our explanation generation method. with annotations on the paper sections where we will discuss the specific components.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Explanatory Queries and Explanatory Criteria</head><p>The explanatory process in our case is initiated when the user raises an explanatory query. This paper will exclusively focus on contrastive queries <ref type="bibr">[13]</ref>,</p><p>where the user is interested in why the system chose the current decision as opposed to an alternate decision they may have expected. The question discussed in Example 1, i.e., "Why can't you bake the cake?", is in fact a contrastive query presented in an implicit form. One can equivalently represent the query as " Why is the system claiming that no solution is possible as opposed to following some policy &#969; &#8593; ?"</p><p>For each query, we will identify an explanatory criterion. An explanatory criterion corresponds to a bound on the possible value that a valid solution can take along one of the optimization objectives (i.e. cost or probability of reaching the goal), but that would need to be violated for the alternate plan to be preferred over the current decision. For example, in the case of the query discussed in Section 1, an explanatory criterion would be the fact that there exist no policy for the model, in which the probability of reaching the goal from initial state is bigger than zero. This explanatory criterion will be surfaced to the user along with the rest of the explanatory information. We will provide a more formal characterization of explanatory criteria in Section 3.2 and Section 4 will include an exhaustive characterization of all types of explanatory criteria possible in a stochastic planning setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Inferential Capability Gap</head><p>Our central explanatory challenge is the fact that there may exist an asymmetry between the human and the system with regards to their computational capabilities (referred to as the Inferential Capability Gap). In Example 1, human would find it much harder to use the information regarding the complete task and motion planning problem to verify whether in fact there exists any policy which can get to the goal with non-zero probability. We will operationalize such computational e!ort required by the human by using a proxy function Comp H .</p><p>We will use this Comp H function to identify explanatory information that places minimal computational demands on the user. We will formalize Comp H in Section 3.2 and delve further into how we can operationalize it in Section 3.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Simplifying Model Transformations</head><p>The argument we will make throughout this paper is that even if the complexity of establishing an explanatory criterion (as measured under Comp H ) in the original model is high, one can use simplified representations of the original model where one could establish that the same criterion is satisfied with lower e!ort. In Example 1, rather than asking the user to reason about the unsolvability of the full stochastic task and motion planning problem, the system surfaces a much simpler problem, i.e., it presents its inability to create batter even while ignoring all the motion level constraints and any non-determinism in the task.</p><p>We can create such simpler representations, by taking the original model and applying a sequence of model simplification transformations. To ensure that presenting the user with a simplified model doesn't mislead the user about the underlying task, we will additionally enforce that the transformations will not violate any explanatory criteria that were previously true. A requirement we will capture under a concept referred to as the soundness of the transformation. The central computational challenge for generating such explanations is to identify the sequence of transformation that will generate the simplest representation that still conserves the explanatory criteria, which previously held. We will formalize model transformations in general in Section 3.2 and discuss specific transformation classes in Section 5. We discuss methods for generating the transformation sequences in Section 3.5 and look at a more specialized algorithm in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.4.">Explanatory Witness</head><p>Even after providing the model, the user may not be able to reason about the explanatory criteria on their own and as such may require additional assistance.</p><p>We will refer to such information as Explanatory Witness. These specifically correspond to additional information extracted from the simplified model that demonstrates why the criteria are satisfied by the given model. We will provide a detailed discussion and a classification of the explanatory witness classes in Section 3.4 along with the discussion of how we will generate them for the problems we consider. Also in the related work (Section 8) we will also provide a discussion of the various explanatory witnesses that has been considered previously in the literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.5.">Final Explanatory Message</head><p>As part of the final explanation the system would surface the following information, (a) an explanatory criterion (though depending on the context this may be implicit), (b) a simplified model where the criterion would be satisfied, and (c) an explanatory witness that demonstrates the fact that the criterion is satisfied.</p><p>In example 1, surfacing the explanatory criterion involves mentioning that there are no policies with non-zero goal reaching probabilities. The model is constructed by skipping motion level constraints, the probability information and considering a more myopic objective (i.e., making batter). Finally, the explanatory witness here consists of the sample trace.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Framework Formalization</head><p>Our basic explanatory setting consists of a sound and optimal model-based decision-making system that uses model M R &#8594; (which may correspond to a description M D &#8594; ), to come up with its decisions and it needs to respond to possible explanatory queries a user of the system may raise either about its current decision or about alternative decisions the user may have expected.</p><p>The central focus in this paper is to provide contrastive explanations <ref type="bibr">[13]</ref>,</p><p>where contrastive explanations are said to be explanations that take the form of responses to questions of the form "Why P and not Q?", where 'P' is referred to as the fact being explained and 'Q' is referred to as the foil the fact is being contrasted against. There is a lot of evidence from social science literature that contrastive explanations underpin most of our everyday explanatory dialogues and, as such, have been receiving a lot of attention in recent years <ref type="bibr">[13,</ref><ref type="bibr">14,</ref><ref type="bibr">15]</ref>.</p><p>In our scenario, we are always interested in answering questions of the form.</p><p>" Why policy &#969; and not any policy in the set # &#8593; " That is the explanation always comes down to establishing the choice of one policy &#969; (possibly the current one being proposed by the system), over the set of alternate policies (possibly expected by the user). More formally, Definition 1. An explanatory query is represented by a tuple of the form</p><p>where &#969;, referred to as the fact policy, corresponds to the policy proposed by the system and # &#8593; , referred to as the foil set, corresponds to the alternate policies expected by the user.</p><p>To support explaining unsolvable planning problems (i.e., there exists no policy with non-zero probability of reaching the goal), we will use the notation &#8658; to denote the fact policy for an unsolvable planning problem.</p><p>In cases where the system is a sound, complete and optimal decision-maker, the system can respond to an explanatory query by establishing the preference of one fact policy over the foil set (or at the very least establishing they are equivalent). In the the MDP classes studied in this paper, this always takes the form of establishing that the fact policy either has a higher probability of achieving the goal or lower cost as compared to the policies in the foil set.</p><p>In this paper, we will argue that one could achieve a deeper insight into the explanatory process by separating the form of the question from the underlying model constraint that needs to be established as part of resolving the user query.</p><p>In essence, one could convert every contrastive explanatory query into a problem of establishing that a planning model can only generate solutions that satisfy a particular threshold with respect to one of its optimization objectives.</p><p>However before defining an explanatory criterion, we will introduce the notion of a policy satisfying an optimization threshold. Specifically, an optimization threshold specifies a threshold (i.e., an upper bound or a lower bound) over one of the optimization metrics (i.e., cost or goal-reaching probability). In the case of cost, we will limit ourselves to a lower bound and for reachability we will only consider upper bounds. A policy is said to satisfy the threshold for a given model, if the threshold satisfied by the valuation of the policy in the initial state (or more generally a state of interest) in the dimension of the optimization metric specified in the threshold. More formally, we will denote this as, Definition 2. An optimization threshold is denoted by a tuple &#949; = &#8594;O, X &#8593;, where O is an optimization objective (cost function J or goal-reaching probability P ) and X is an upper bound or a lower bound over O. A policy &#969; is said to</p><p>An explanatory criterion is essentially an optimization threshold that is satisfied by the fact policy but not by the foil policies.</p><p>Definition 3. For an explanatory query Q = &#8594;&#969;, # &#8593; &#8593; and an optimization threshold &#977; = &#8594;O, X &#8593;, a model M is said to satisfy &#977; with respect to Q, if under model M, &#969; satisfies &#977; but none of the foil policies in # &#8593; does (denoted as</p><p>Definition 4. For an explanatory query Q = &#8594;&#969;, # &#8593; &#8593;, an optimization threshold &#977; = &#8594;O, X &#8593;, is said to be an explanatory criterion for a model M, if &#969; is preferred over # &#8593; when M satisfies the criterion &#977; with respect to Q (i.e.,</p><p>The above definition lays out explanatory criteria as applicable in cases where the foil set may be explicitly enumerated. In many cases, the foil set may only be implicitly specified. As we previously saw, the foil set in Example 1, corresponds to set of all policies. In this case, the explanatory criterion must be satisfied by all possible solutions for the given model. To capture this case, we will treat the satisfaction of explanatory criteria as a model-level property, i.e., we will simply use the notation M |= &#977; or M &#8657; |= &#977; by dropping the query subscript. Definition 5. For an optimization threshold &#977; = &#8594;O, X &#8593;, a model M is said to satisfy &#977;, if for model M, every policy &#969; satisfies &#977; (denoted as M |= &#977;).</p><p>Note that this is a more general formulation, as we can always capture explicit foil and the fact policy cases by creating new models that only allow the policies in question.</p><p>With respect to the Example 1, one explanatory criterion could be &#977; = &#8594;P, = 0&#8593;, i.e., if the criterion &#977; is satisfied by the model, then the model is only allowed to support policies whose probability of achieving the goal is equal to 0. Section 4 presents an exhaustive characterization of all explanatory criteria possible in a stochastic planning setting and how they connect to every possible contrastive query.</p><p>While explanatory criteria help establish why the foils may not be preferred, we still have to discuss how the user may realize that the explanatory criterion is satisfied for the given problem. Towards this end, the paper builds on the basic fact that humans are agents capable of reasoning about sequential decisionmaking problems. Thus, as long as the model is in a human-readable form 1 , the system could communicate the model information and expect the human to try to reason about whether the relevant explanatory criteria satisfy on their own. However, such a naive approach would overlook the fact that the original decision-making problem may be too complex for the user.</p><p>To ensure that our explanation method does not place undue cognitive demands on the user, we will try to explicitly take into account how easy it is for a user to verify whether an explanatory criterion is satisfied in a given model. To capture the inferential burden placed on the human or any agent by an explanation, we will introduce the function Comp(&#8226;, &#8226;) to denote the computational capability of the agent. This function will capture the time taken by the agent to establish whether an explanatory property is satisfied for a given model. Definition 6. For a given set of models M and a set of explanatory properties K, the computational capability function Comp : M &#8595; K &#8596; R &#8656; {&#8659;} function returns the time taken by a reasoning agent to establish whether an explanatory criterion &#977; &#8599; K is satisfied in a model M &#8599; M. The function returns &#8659; if the agent is incapable of establishing whether the criterion is satisfied.</p><p>We will use Comp H to denote the computational capability function for the human and Comp sys for the one associated with the AI agent. There e!ectively exists an inferential capability gap between the two if the computational capability function returns di!erent values for the same model and explanatory criterion. </p><p>We assume our models to be already in a human-readable form, and even if they are not one could always convert them to such forms using methods like those discussed in <ref type="bibr">[11]</ref> In theory, we could have Comp sys (M, &#977;) &gt; Comp H (M, &#977;), i.e., cases where for the same model the human has an easier time establishing some criterion, but in this paper we are mostly interested in cases where we have</p><p>Now the central argument we will make throughout this paper is that even if the complexity of establishing an explanatory property in the original model is high (measured in terms of Comp H (M, &#977;)), one can use simplified representations of the original model where one could establish that the criterion is satisfied with lower e!ort. We can now give the user information about this simplified model and even use this model as a basis for generating other explanatory information.</p><p>In particular, we will look at the generation of explanatory witnesses that will help the user better understand why the criterion is satisfied in the current model.</p><p>Towards formalizing this intuition of explanation generation, we will first start by introducing the notion of model transformation that will become the basis of our remaining framework. In the paper, we will refer to a model transformation function as one that takes a valid MDP model and generates a new MDP specification, i.e., Definition 8. A model transformation is a function of the form F &#8226; : M i &#8660; &#8596; M j where M i was the original MDP and M j is the newly generated MDP, where both M i and M j are two MDPs that satisfy P-assumptions. Note that the above definition is an extremely weak notion and not one that in any way helps us build model representations that could help resolve users' queries. To establish that, we need to introduce two new notions about the transformations, namely, soundness of transformation and whether the transformations are building simpler representations.</p><p>The soundness of transformation relates to whether any explanatory criterion that is satisfied in the new model has a corresponding criterion that was satisfied in the original model, or more formally: Definition 9. A model transformation F i is said to be a sound transformation for a model M with respect to an explanatory criterion &#977; i , if whenever &#977; i is satisfied by F i (M), it is also satisfied by M.</p><p>Note that we are not requiring the transformations to always preserve the explanatory criterion, i.e., &#977; i satisfied by M if and only if &#977; i satisfied by F i (M). As we will soon see most of the e!ective transformations we will look at results in optimistic approximations, where we can't always guarantee that the transformations will preserve the criterion, but we can be sure that the transformations are sound as defined above,i.e., there couldn't exist a criterion</p><p>Each transformation we described in the introduction, which we will go into further details in Section 5, are examples of sound transformation for the explanatory criterion &#8594;P, = 0&#8593; Now we will assert that transformation results in a simpler representation or equivalently is a simplifying transformation if it is easier to establish the criterion in question in the resultant model, i.e., Definition 10. A transformation F i is said to be a simplifying transformation for a model M and criterion &#977;,</p><p>additionally F(M) is referred to as a simpler representation of M.</p><p>One can now see that what we hope to build as explanations are models that are built by repeated application of simplifying but sound transformation.</p><p>However, there is a big question we have yet to answer, namely, how does one stop the method from generating extremely simple models that are, nonetheless, completely disconnected from the original problem at hand? E!ectively this would turn the output of the methods into lies rather than satisfying explanations (comparable to discussions provided by papers like <ref type="bibr">[16]</ref>), as it would give the user no further insights into the original planning problem. One could try to avoid this by placing restrictions on the types of models generated by the transformation functions. For example, requirements like the need to share action/state-space or the need to preserve some transition function. However, most of these methods are too limiting and insu"cient when we consider cases where there might exist a vocabulary mismatch between the user and the decision-maker <ref type="bibr">[11]</ref>. In such cases, one might need to translate the current system representation into forms that are easier for the user to follow and it may be quite hard to establish any form of equivalence between model components of the learned models and the original model. Instead, in this paper, we will define the validity of the explanations in terms of their impact on human expectations about the system.</p><p>In particular, we will require that the transformed model doesn't satisfy any explanatory criteria not true in the original model (i.e., the model on which the transformation function was applied). E!ectively this would make sure that the transformation does not prevent the user from asking any question they might have about the system's capabilities. This will e!ectively ensure that the user doesn't place unwarranted trust in the system due to the explanation, a requirement for generating e!ective posthoc explanations <ref type="bibr">[11]</ref>. We will refer to such as universally sound transformations for a given model: All the transformations we will discuss in Section 5 are examples of universally sound transformations. Additionally, we will require that every transformed model presented to the human be explicitly noted to be a simplification of the true model, while noting the exact relationship between the true model and transformed model when possible. Usually when we look at transformations that are generated through syntactic transformations of human readable model descriptions (as in the case of transformations discussed in Section 5), such relations are much easier to note.</p><p>It is worth keeping in mind that our objective here is to identify a potential model where the current solution can be contrasted against the foil. One of the goals of the explanatory process would be to help induce a user mental model that is equivalent to this model. However, there could be a component of the explanatory message which also contains explicit information contrasting the two, we will look at such information in Section 3.4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Modeling User's Computational Capabilities</head><p>The previous section only provides a cursory discussion of the function capturing the computational burden placed on the human to establish a specific criterion, i.e., the function Comp H (&#8226;, &#8226;). An exact characterization of Comp H (&#8226;, &#8226;) would be hard, since it would require accurately capturing the inferential capabilities of the human. However, one could still use a number of simpler representations of Comp H (&#8226;, &#8226;) to calculate useful representations, some of the possible choices here include</p><p>&#8226; Using computational model with psychological fidelity -This could correspond methods like, the ones that leverage computational implementation of psychological models like prospect theory <ref type="bibr">[17]</ref>, use of decision-making algorithms that make use of limited memory <ref type="bibr">[18]</ref>, use of various models of bounded-rationality including ones from behavioral game theory like the finite nested rational model <ref type="bibr">[19]</ref>, etc. However, works in these models are still in preliminary stages and we are unaware of any existing techniques that could directly be applied to our problem framework.</p><p>&#8226; Directly learning Comp H -Another possibility might be to directly learn the Comp H function from data collected from human subjects. While we are unaware of any method that can currently learn the function of the form we require, some preliminary work in this direction include <ref type="bibr">[20]</ref>.</p><p>&#8226; Using Computational Proxies -While we may not have access to Comp H , we could directly measure the hardness of establishing the explanatory property using exact and sound methods and measuring the time. While this isn't expected to be equal to the actual inferential burden faced by the user, we could use this as an approximation of the exact value. In addition to exact time taken by an automated reasoner, one could also use other measures like the size of description, length of the most likely policy trace, etc.</p><p>&#8226; Using Human-Subject Studies to Establish E!ectiveness of Individual Transformations -Another method might be to not directly learn Comp H , but instead identify any preference use might have on various types of transformation that may be applicable for simplifying a given model. Then one could leverage the preference between the transformations to identify solutions that may be preferred by the user.</p><p>In this paper, we will mostly focus on the latter two strategies, wherein we will look at generating simplified model that are simpler with respect to an automated reasoner. We also introduce an ordering between the various transformations to be applied (where the more preferred transformation are applied first). This ordering between the transformation are determined through a user subject study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Explanatory Witness</head><p>Even after providing the model, the user may not be able to reason about the explanatory property on their own and, as such, may require additional assistance. We will refer to such information as Explanatory Witness. XAI literature includes many examples of such information, and in fact, many works focus on identifying such information while making implicit assumptions about the model information with which the user is supposed to make sense of this information. In general, from the current literature, we can identify three categories of such information; 1. Proof of explanatory properties: A proof of the property being explained, which in our case can be provided in the minimal model. Though unfortunately, it is very hard to create exact interpretable proofs for most query properties. In general, most of these proofs would be incomplete or abstract in the sense of skipping some steps. An example of such abstract explanatory witnesses would be the use of Q-values to contrast the current choice against alternative (as in the case of <ref type="bibr">[21]</ref>). Here the explanatory witness doesn't provide information as to why the current action in a state or the alternative has a specific Q-value.</p><p>2. Existential Information: This corresponds to presenting instances of potential solutions (plan/policies) or parts of solutions (a specific execution trace from a given policy) that acts as a demonstration of the property being explained. Once such an example is selected, the example could then be contrasted with the original solution and the information presented to the user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Counterfactual Information:</head><p>In the final category, the user is provided with examples of counterfactual solutions and even planning models where the property being explained isn't satisfied. The assumption is that these examples would help humans get a better sense of when the property might not be satisfied. While generating explanations for user studies we will use a sampled policy execution trace as the corresponding Explanatory Witness (a type of existential information). For example, when trying to explain why the cost of a policy may be above some threshold, then one could find a specific execution trace, i.e., a sequence of states, actions and resulting state that can be sampled from the current policy and show that the cost of that trace is above the current threshold. This is a particularly appealing for criteria involving cost. For criteria related to goal reachability, possible explanatory witness include the use of qualitative occupancy frequency (similar to <ref type="bibr">[22]</ref>) and for unsolvability one could present the infeasibility of some example paths to the goal (as in the case of <ref type="bibr">[23]</ref>).</p><p>Section 8 includes a discussion of existing types of explanatory witnesses studied in the literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Generating Explanations</head><p>With these basic definitions in place, we can describe the actual approach to generating the explanations.</p><p>The first component, we will need to consider is a way to identify an explanatory criteria. It should be clear that for any given explanatory query, there might be multiple possible explanatory criteria, especially ones that use di!erent optimization metrics. However, given the fact that there exists a strict ordering between the optimization metrics, we will generally prefer explanatory criteria related to the criterion with the higher preference. We will denote the fact that the explanatory criteria &#977; i is preferred over a property &#977; j , by using the notation &#977; i &#8598; &#977; i . In our setting, a reachability related criteria is preferred over those related to cost. This means that one could test whether we can use goal reaching probability of the fact policy to derive an explanatory criteria, if not switch to a cost based one (again the threshold is set by the cost function associated with the fact policy).</p><p>Once the most preferred explanatory criterion is identified, our next challenge is to determine the most simplified model where the criterion can still be established.</p><p>Definition 12. Given a model M R &#8594; , a set of universally sound transformation functions F = {F 1 , ..., F k }, the problem of generating a minimal explanatory model for a given explanatory criterion &#977;, corresponds to the problem of finding a sequence of transformation T min which results in a simplified representation M that is sound for &#977; and there exists no other sequence of model transformations that will result in a simpler yet sound representation, i.e,</p><p>The new model T min (M R &#8594; ) will form the basis of the explanation.</p><p>The basic algorithm for generating such minimal explanations is given in Algorithm 1, which correspond to an exhaustive search over all possible transformations and then returning the transformation sequence, explanatory criterion pair that meets the requirements provided in Definition 12.</p><p>Note that the above algorithm is an extremely computationally expensive one.</p><p>The search space is exponential over the number of transformations possible and each search node evaluation in our case consists of solving a planning problem.</p><p>Algorithm 1 Basic Search</p><p>1: procedure search 2:</p><p>Output: The transformed model M</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4:</head><p>Procedure:</p><p>5:</p><p>Curr Minimal E!ort &#8733; &#8659; 7:</p><p>for Each subsequence T &#8593; of F do 8:</p><p>if Curr Minimal E!ort &gt; Comp E!ort then 10:</p><p>However, as we will see in Section 6 by making commitments on the types of transformation and model classes, we can make use of a significantly more e"cient search algorithm.</p><p>As for generating the explanatory witness, once we have identified the minimal explanatory model, one can use any of the methods outlined in Section 3.4 to generate an appropriate explanatory witness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Queries and Explanations</head><p>Figure <ref type="figure">3</ref> presents the hierarchy of questions possible in this problem setting organized into two categories. Now for each type, we further list two possible subcategories. The first category corresponds to queries exclusively about the current best solution provided by the system and the second corresponds to queries that contrast the current policy with an alternative the human had in their mind. In the former, the user would want to understand why the solution is worse o! than what they were expecting (say in terms of goal reachability or cost), and in the latter, the user would want to know how the current policy compares against the alternative they had in mind. In the end, both categories can be mapped into a question that can be resolved by comparing a policy against a threshold. In the case of the former category, the threshold directly comes from the user's question and the system's policy is compared against it (for example, the kind of questions that might fall into this category would include instances like Why does the policy cost more than X? or Why is the likelihood of reaching the goal less than Y?), while in the latter the alternative posed by the user is compared against the cost or goal reachability of the current policy. In the end, the query hierarchy concretizes into three specific query types.</p><p>Each query type is characterized by a specific explanatory criteria, 1. &#977; &#949;1 is satisfied when there exists no policy that has a cost less than a certain threshold for the initial state under the model 2. &#977; &#949;2 is satisfied when no policy achieves the goal from the initial state with non-zero probability 3. &#977; &#949;3 is satisfied when there exists no policy whose probability of achieving the goal from the initial state is above a certain threshold. These criterion classes correspond to a large set of specific explanatory criteria and the exact threshold is determined by the specific query. For a specific model M and a threshold X, we will denote the fact that an instance of the criterion class &#977; is satisfied for the threshold as M |= &#977;(X) (in the case of &#977; &#949;2 X is limited to 0).</p><p>In the categorization, we have split the queries related to goal reachability into two distinct categories, even though, one could specify it as a comparison to a threshold. The reason why we chose not to do that, is two-fold;</p><p>1. Not only are queries about unsolvability a natural and commonly studied explanatory query type <ref type="bibr">[24]</ref>, they are also a lot more accessible and more likely to be used by non-experts who may not be aware of or comfortable with the exact probability associated with the planning problem.</p><p>2. One can apply many more explanatory techniques to answer questions related to solvability, which is not necessarily applicable to questions that use direct comparison to probability threshold. For example, as we will see, the use of determinization or even the use of models with purely qualitative non-determinism is possible to answer queries related to solvability but doesn't apply to queries that explicitly use probabilistic thresholds. Any possible contrastive query, which can be answered by a decision-making system, would correspond to one of these explanatory criteria. Following the conventions generally used in MDP solution strategies, we will establish a strict ordering between the criteria, namely, &#977; &#949;1 &#8598; &#977; &#949;2 &#8598; &#977; &#949;3 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Model Transformation Classes</head><p>In this section, we will look at some specific transformation classes. The classes were selected based on the fact that some restricted forms of these transformations were already studied in the wider XAI literature. In particular, Section 5.1 presents projective state abstraction (F F \! ); Section 5.2 presents determinization (F " ); Section 5.3 covers problem decomposition (F D ) and Section 5.4 local approximation (F L ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">State Abstractions</head><p>The first transformation we could make use of is state abstraction that can help reduce the underlying state space of the MDP <ref type="bibr">[25]</ref>, which has been used for explaining deterministic plans <ref type="bibr">[26]</ref> and to summarize policies as in <ref type="bibr">[27]</ref>. In particular, we will consider state aggregations that replace a set of states in the underlying MDP M R &#8594; with a single state <ref type="bibr">[28]</ref>. In Example 1, it will be a state abstraction that will let us ignore most of the state variables, including those related to motion-level constraints. We will define state abstraction as Definition 13. For a given model M = &#8594;S, A, I, P, C, G&#8593;, M = &#8594; S, A, I, P , C, G&#8593; is said to be an abstraction of M, if there exists a surjective mapping from states and actions in M to M (where F be the mapping), then &#8242;s 1 , s 2 &#8599; S and for any a &#8599; A, if P (s 1 , a, s 2 ) &gt; 0, then P (F(s 1 ), F(a), F(s 2 )) &gt; 0.</p><p>While state abstraction has been a popular topic of investigation in MDP planning/RL topics. We are particularly interested in its use to create sound representation for all the queries. Since we are looking at symbolic representation, it would be useful to have methods that create abstractions that are valid symbolic models. Specifically, we can make use of syntactic projections of the model descriptions, which have previously been used for generating heuristics <ref type="bibr">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.1.">State Abstractions Through Syntactic Projection</head><p>We will define the transformation as Definition 14. For a given model description M D = &#8594;F D , A D , I D , G D &#8593; and a set of propositions $ &#8600; F D the syntactic projection (represented as a function</p><p>Where the new actions A F F \! is given as follows: for each a i &#8599; A D , there existing a corresponding action F F \! (a i ) = &#8594;prec i \ $&#8593; (To simplify discussion we will be overloading the notation F F \! to stand for any mapping from the components of the original model or description to those in the new model)</p><p>Where the new probability of the e!ect p &#8593; is given as Proposition 2. The model M F F \! represented by F F \! (M D ) is a state abstraction (per Definition 13) of the model M. Additionally, for any states i, j &#8599; S for MDP M, then P (i, a mdp ai , j)</p><p>) is a well formed model description, so there exists a unique model M F F \! induced by it. The surjective mapping from state space S of M to S F F \! (Overloading the notation a bit, we will use F F \! to also capture this mapping) is given as</p><p>It should be easy to verify that this is in fact a surjective function. The relationship between the probability functions and cost functions comes directly as a result of the transformation.</p><p>We can use the abstraction function to also define a relation among the states in the model description, such that</p><p>Note that this is an equivalent relation and thus helps partition the state space into disjoint sets that map into the same abstract. If &#349;i is an abstract state for</p><p>, then we will use the notation F &#8600; F \! 1(&#349; i ) to capture the quotient set for the relation &#8712; F F \! that maps the state from the concrete model into the abstract state &#349;i , i.e.,</p><p>Unless specified otherwise, the cost function for the concrete model are denoted as functions over just state indexes (for example J(i)), while that for the abstract model it is denoted with states where an abstraction function has been applied (for example J(F F \! (i))).</p><p>Next we will detail some simple properties of the model and demonstrate the central properties we can use for explanation.</p><p>This takes us to the next result Proposition 3. Probability of a trace (a sequence of action control state tuples) from a given state to a goal state is either preserved or increases over the transformation, i.e., for a trace t = &#8594;s 1 , a 1 , ..., s k &#8593;,</p><p>The result follows from earlier proposition.</p><p>Proposition 4. Probability of an action causing a transition from an abstract state &#238; to another abstract state &#309;, is equal to the sum of the probability of transitions from one of the states in F &#8600;1 F \! ( &#238;) to all the states in F &#8600;1 F \! ( &#309;) under that action.</p><p>The result follows from the fact that given the declaration method, all the states that merged must have had similar e!ects, so the choice of the state</p><p>F \! ( &#238;) doesn't matter. The second part of the proposition follows from the transformation itself.</p><p>With this transformation in hand, we can show that the syntactic transformation results in a state aggregation. We can now show the optimal cost function in the new model is lower than that of the original model.</p><p>Proof Sketch. We can show this by initializing a cost function for the abstract model J F F \! with cost from J &#8594; , such that for any abstract state &#238; as</p><p>If we now apply a bellman operator T , given propositions 2 and 4, we will have</p><p>For P condition the bellman operator is still monotonic function <ref type="bibr">[3]</ref>, and converges to the optimal cost function. Thus the optimal cost function for F F \! (i) must be less than or equal to J &#8594; (i).</p><p>Next, in regard to the probability, we can establish that Lemma 2. For every state i in the model M,</p><p>This result directly follows from Proposition 3 (a similar result was also established in <ref type="bibr">[29]</ref>). With these two propositions we have established that state abstraction does in fact result in new models that underapproximated cost and overapproximates reachability. Thus establishing that the syntactic projection described here results in a valid state abstraction per Definition 13 and the resultant transformation is sound with respect to all three explanatory criteria. In Example 1, consider the original probabilistic e!ects of the mixing action, which says after mixing, you might have cake batter with a probability of 0.5, a cake batter with bubbles in it with a probability of 0.25, and with a probability of 0.25, no cake batter at all.</p><p>(probabilistic 1/2 (and (has-cake-batter))</p><p>1/4 (and (has-cake-batter)</p><p>(cake-batter-has-bubbles)))</p><p>After projecting out (cake-batter-has-bubbles) you get an e!ect that says the probability of having cake batter is 0.75 (probabilistic 3/4 (and (has-cake-batter)))</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Problem Determinization</head><p>Note that the abstraction operation creates models with probabilistic e!ects unless the e!ects merge into a single e!ect. At least for some of the queries, the system could generate valid responses while using an optimistic determinization of the model that ignores all stochasticity of the model. For example, in Example 1 even if there were no undesired side-e!ects due to stochasticity (the 0.25 probability of the mixing action not forming a cake batter), the mix action could still not be executed as it has a missing precondition (has-cake-whisk). One possible way to create such optimistic determinization could be to generate a new model where action with multiple stochastic e!ects is turned into multiple actions with deterministic e!ects. Determinization belongs to a larger class of transformations (which we will refer to as problem class simplification) that transforms the original problem to simpler decision-making problems for the sake of generating explanations. While there are some instances of problem class simplification for explanations for multi-objective explanations (cf. <ref type="bibr">[30]</ref>), we are unaware of any direct use of determinization to simplify explanations. Where for every a i = &#8594;prec i , {e 1 i , ..., e k i }&#8593;, there exists k actions in F " (A D ) such that a j i = &#8594;prec i , add j i , del j i , 1, c j i &#8593;, i.e., it generate the j th e!ect with probability 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1.">All Outcome Determinization</head><p>Such determinization operations are sometimes referred to as all outcome determinization <ref type="bibr">[31]</ref>. Which is also closely connected to hindsight optimization techniques that have been studied in multiple fields including control theory <ref type="bibr">[32]</ref>.</p><p>The transformation could be used for both cost criterion (&#977; &#949;1 ) and solvability criterion (&#977; &#949;2 ). Also note that for the class of models considered in this paper the transformation of the model description can be carried out e!ectively. The fundamental property of this determinization we can use here is the following Proposition 6. Let &#949; = &#8594;I, a 1 , ...., a k , g&#8593; be the sequence of symbolic states and actions corresponding to a trace from the initial state to a goal state g &#8599; G with non-zero probability for the model defined by the description M D . Then A(&#949; ) (the sequence of actions appearing in &#949; ) is a valid plan for the deterministic planning model F " . That is when the sequence A(&#949; ) is executed in I it will take you to the state g. This property directly comes from the form of the transformation is widely established in the determinization literature. Lemma 3. The cost of best possible plan in F " (M D ) is guaranteed to be less than or equal to J &#8594; (I) for the model M corresponding to M D After all the lowest cost trace that can be sampled from any policy is a plan in the determinized model. As the cost of the policy should be higher than the cost of its lowest cost trace, it should be higher than the cost of the optimal plan in F " (M D ). With Lemma 3 and 4 in place should be clear that this transformation could be used for both cost criteria (&#977; &#949;1 ) and solvability criteria (&#977; &#949;2 ). One can actually in fact make a stronger claim and show that it is even sound for (&#977; &#949;3 ). as it as &#977; &#949;3 for threshold 1 a tautological statement). &#977; &#949;3 limited to threshold 0 corresponds to &#977; &#949;2 , which per our earlier discussion the transformation F " is already sound for. Thus establishing the fact that F " is universally sound.</p><p>With regards to Example 1, such transformation will allow us to ignore the probability of mix action failure when giving the explanation. In particular, we will have two new mixing actions. One with e!ect (has-cake-batter) and the other with e!ect (and (has-cake-batter) (cake-batter-has-bubbles)).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Problem Decomposition</head><p>Even after applying all the above transformations, the plans generated from the resultant model could be extremely long. One way to address would be to decompose the original task into smaller subtasks. In Example 1, rather than talking about the problem of baking cake it can focus just on explaining the subproblem of making the cake batter. In this section, we introduce problem decomposition with respect to an initial state that focuses on subproblems that reuse states and actions of the original problem and is either cheaper or it is easier to achieve the goal Definition 16. Given an atomic MDP M = &#8594;S, A, P, I, C, G&#8593; with an optimal cost J &#8594; , an MDP M &#8595; = &#8594;S, A, P, I, C, G &#8593; &#8593; with an optimal cost J &#8593; , is said to be a subproblem if it is guaranteed that J &#8593; (I) &#8734; J &#8594; (I) (where the decomposition is called cost based decomposition) or P &#8593; (I) &#8734; P &#8594; (I) (where the decomposition is called reachability-based decomposition).</p><p>For queries related to &#977; &#949;1 if we can establish that the cost of the subproblem under cost-based decomposition is greater than the limit, then that automatically establishes that the original problem should be worse. For queries related to reachability, we can look for similar arguments for a subproblem under reachability-based decomposition. Now the question is how can we find such decompositions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1.">Subgoals</head><p>For goal-directed problems with an initial state, a natural idea could be subgoals, where subgoals are representations of intermediate states that the policy may need to achieve before getting to the final goal. A specific subgoal that could be useful here is landmarks, which were recently adapted for SSPs by <ref type="bibr">[33]</ref>. These are propositional facts that need to be satisfied by any path from the initial state to goal with non-zero probability. While that paper introduced these facts as a way to summarize policies, we use them to decompose problems and form simpler problems. Such landmarks can be automatically extracted from the model description M D &#8594; . For unsolvable problems, such landmarks could also be generated from abstractions of the problem where the goal is reachable (cf. <ref type="bibr">[24]</ref>).</p><p>In Example 1, has-cake-batter is a landmark for the goal bake-cake. A problem decomposition using landmarks as the new goal is both a cost and reachability decomposition and is sound for all three explanatory criteria. </p><p>Proof Sketch. Let &#949; = &#8594;I, a 1 , ...., a k , g&#8593; be the sequence of symbolic states and actions corresponding to a trace from the initial state to a goal state g &#8599; G with non-zero probability for the model defined by the description M D . From the definition of landmarks, we know that there must be state s f in the sequence &#949; such that f &#8599; s f . Therefore any trace with nonzero probability can be decomposed into a prefix corresponding to the sequence that leads to a landmark state and then the sequence from landmark state to the final goal. Additionally we can see that the probability of this trace prefix must be greater than or equal to the probability of the full trace. Thus for all the traces sampled from a given policy the total probability of getting to all landmarks states must be higher than that of reaching the final goal. We can use a similar reasoning to show that the total expected cost of reaching the landmark states must be less than or equal to the cost of reaching the final goal state.</p><p>The above lemma establishes the fact that problem decomposition through landmarks underapproximates costs and overapproximates reachability, thus its valid for all three explanatory criteria. This is because you just need to replace the goal description in M D to form the transformed description. Now coming to the complexity of generating a landmark, the problem of finding landmark in general is PSPACE-Complete <ref type="bibr">[34]</ref>.</p><p>However, there are classes of landmarks which can be generated more e!ectively.</p><p>One class of such landmarks is the causal landmarks which can be generated in time polynomial to the size of the model description <ref type="bibr">[35]</ref>.</p><p>For the example, after this transformation the goal condition of the model description will be changed from (bake-cake) to (has-cake-batter).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Local Approximation</head><p>The next transformation we will consider is that of local approximations.</p><p>Local approximations <ref type="bibr">[36]</ref>, have been successfully used in the context of explaining machine learning decisions. The basic premise being that the model being used for explanation need not accurately reflect the entire decision-space, but only the parts that are relevant to the current query. One could translate the same intuition into the sequential decision-making settings, and look at creating simpler representations of the model that focus only on a part of the transition system. Revisiting Example 1, through local approximation we can establish the facts that the robot doesn't need to talk about any of its skills unrelated to cooking or baking cakes. Based on the fact that it is in a home, it can also figure out that it can skip providing any information to the user about it's ability to use industrial mixers to make cake batter.</p><p>We can describe a local approximation (represented by the transformation F L ) for a set of states &#348; and a set of actions &#194; as a function that generates a new model that conserves the transition probabilities and cost functions related to states and actions that appear in &#348; and &#194;.</p><p>Definition 17. For a given model M = &#8594;S, A, P, C, I, G&#8593; and a subset of states and actions &#348; &#8600; S and &#194; &#8600; A, a well formed MDP model</p><p>) if there exists a mapping from &#348; to S &#8593; and from &#194; to A &#8593; (with a slight abuse of notation we will use f L for both the state to state and action to action mapping), such that for any given states i, j &#8599; &#348; and an action a &#8599; &#194;, we have P (i, a, j)</p><p>there exists a mapping from &#348; &#8715; G to G &#8593; . This is a very permissive definition and not all instantiation of the transformation would necessarily lead to transformations that would lead us to ones that may preserve explanatory criteria. So before, we consider an instance of the transformation, let us consider a specific subclass of the transformation, one that focuses on cases where the subset of states and actions consider form a closed transition system ,i.e., all states reachable from &#348; under the policy is a subset of &#348; Definition 18. A set of states &#348; and set of actions &#194;, is said to be closed for a policy &#969;, if 1. Execution policy in &#348;, will never lead to a transition to a state outside the set, i.e., &#8242;j &#8599; S \ &#348; &#8657; &#8601;i &#8599; &#348; and a &#8599; &#226; such that P (i, a, j) &gt; 0 2. The set &#194; covers all actions assigned to states in &#348;, under the policy &#969;, i.e., &#8242;i &#8599; &#348;, &#969;(i) &#8599; &#194;</p><p>We will now see how any local transformation defined over a closed set, will result in universally sound transformation. More formally, we can state this as; Proposition 8. If the sets &#348; and &#194; are closed for a policy &#969;, then &#8242;i &#8599; &#348;, J &#8593;&#8594; (f L (i)) &#8734; J &#969; (i) and P &#8593;&#8594; (f L (i)) &#8771; P &#8594; (i), where J &#8593;&#8594; and P &#8593;&#8594; are the optimal cost function and the maxprob probability for the model F L (M).</p><p>Proof sketch. This follows directly from the fact that the transformation introduces no new transitions for states and actions that are part of the set &#194; and &#348;. Moreover the transformation conserves both cost and probabilities for those states and actions. This means there exists multiple policies for F L (M) which has the same value and probability for states in &#348; as the original policy &#969;. This means the optimal policy and maxprob policy should lead to policy that are more cheaper and with higher likelihood of getting the goal. Note that this doesn't need to be equal as the local approximations allows for the fact that there may be new actions now applicable in the states that are part of &#348;.</p><p>Proposition 8 ensures that local approximation results in models that underapproximates costs and overapproximates reachability and we can now state the primary theorem related to F L .</p><p>Theorem 4. The transformation F L for the given sets &#348; and &#194; is a universally sound transformation for a model M, if, 1. &#348; and &#194; are closed with respect to an optimal policy and a MAXPROB policy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">I &#8599; &#348; and F L (I) = I &#8593;</head><p>, where I &#8593; is the new initial state in the model</p><p>The fact that the initial state is in &#348; (and its corresponding image remains the new initial state) and that it satisfies the requirement for Proposition 8 means that all three explanation criteria &#977; &#949;1 , &#977; &#949;2 , and &#977; &#949;3 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.1.">Reachability Analysis For Local Approximation</head><p>One way to create a local approximation is to perform a reachability analysis to remove actions and fluents guaranteed to be not reachable from the initial state. To identify the non-reachable fluents, we will consider the delete-relaxation of the all outcome determinization of the original model and try to identify the reachable fluents and actions by building a relaxed planning graph <ref type="bibr">[37]</ref>. The fluents and actions not present in the planning graph are removed from the model description. We will refer to the model description formed through this procedure as F + L (M D ), where + denotes the fact that the local approximation relies on a delete relaxation of the model. First thing to note is that F + L , can be formed rather e"ciently. In particular, we have Proof. This follows from the fact that an all outcome determinization can be generated in polynomial time. The formation of the relaxed planning graph and subsequent identification of unreachable fluents and actions can again be performed in time polynomial to the size of the determinized model. With the fluents and actions to be removed identified, one can form the model description</p><p>Now the fact we need to show is that this model description transformation actually correspond to a local approximation for a certain subset of states and actions and moreover this approximation meets the requirements for Theorem 4.</p><p>In particularly we will see that the transformation will create a local approximation defined over the original MDP where the subset of states and actions considered correspond to set of states that can be formed by the remaining fluent and remaining actions and it is in fact closed. More formally, we can state Proof Sketch. Note that the mapping here is an identity mapping from states and actions in &#348; and &#194; to those in the model M &#8593; . From the construction of the relaxed planning graph, every fluent that is true in initial state will be conserved and thus I must be part of &#348;. The fact that &#348; and &#194; are closed comes from the fact that we are considering a delete relaxation of an all outcome determinization. This means if the process removes a state from consideration (i.e. removes a fluent f that is part of the state), then there exists no non-zero probability trace that can reach that state from the initial state. Thus the states are closed under any policy not just optimal or MAXPROB ones. Similarly for actions, the procedure will never remove any action that is applicable for a state that is part of &#348; and thus must be closed.</p><p>Proposition 10 thus establishes the fact that F + L also corresponds to a universally sound transformation.</p><p>With regards to Example 1, such transformation will allow us to ignore an action mix-with-industrial-mixer if it has a precondition (in-cake-factory), which can never be made true in this case. As such, the delete-relaxed graph will never allow for the application of this action.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Using Stratified Model Transformation Search to Compute Explanations</head><p>Now let us return back to the question of how to e!ectively generate the transformation sequences. Even if we limit our transformation classes to the one we discussed in Section 5, a search for the minimal transformation sequence using Algorithm 1 would be an expensive problem. For one the search space could be large and even if one were to introduce a more informed version of the search any search heuristic we employ will have to compute the e!ect of a transformation on the computational hardness of the problem. A more useful approach may be to set up a hard priority between the transformation classes.</p><p>If we are able to set a primary transformation class, then we can directly try to find the maximal number of transformations we can apply from that class.</p><p>The other transformations need only be considered to the degree that they can be applied to the models generated by applying a maximal sequence of primary transformations possible. From a computational point of view, an excellent candidate for primary transformation might be state abstractions. After all, removing each binary state fluent always reduces the state space by half. Among the most abstract models that support the given explanatory query, we can look at applying the other transformations provided they can conserve the criterion being established.</p><p>However, one could also additionally exploit the specifics of the transformations to get additional improvement in search. Starting with state abstraction transformation (F F \&#8226; ), there are two immediate properties we can exploit, namely the fact that the transformation is commutative and compositional, or more formally Proposition 11. For any model M D and for any proposition set F1 &#8600; F and F2 &#8600; F , we have 1. F F \ F1 (F F \ F2 (M D )) and F F \ F2 (F F \ F1 (M D ), i.e. the state abstraction transformation is commutative 2. F F \ F1 (F F \ F2 (M D ) and F F \( F1&#8771; F2) (M D ), i.e. the state abstraction transformation is compositional Proposition 11 follows directly from the definition and implies that we can apply state abstraction one fluent at a time and can be applied in any order.</p><p>Next moving onto the next three transformations, we see another fascinating property. Namely that the order in which the determinization (F " ) and subgoal decomposition (F D ) is applied doesn't matter. Similarly, the applying determinization and local approximation via reachability analyses (F + L ) in any order also results in the same model. However, applying local approximation after subgoal decomposition is always guaranteed to result in removal of more elements, more formally, Proposition 12. For any model</p><p>)), then we can guarantee that</p><p>Proof. The first result follows from the facts that (a) determinization will not change the goal description and subgoal decomposition only changes the goal description and thus are independent of each other and (b) the reachability is already calculated on an all outcome determinization of the model. For the second result, remember that the relaxed planning graph is always only built until the goal is achieved, as such using a subgoal that is easier to reach could let us prune out more actions and fluents. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1.">User Study</head><p>We performed an ablation study to establish the e!ectiveness of individual transformations to help users understand the explanation.</p><p>Study Objective: Our primary objective with the study is twofold: first, to establish the fact that using model simplification via the framework we introduce is a way to generate explanations that are easier to understand. Secondly, we want to help identify the utility provided by each individual of the four transformation classes.</p><p>For the primary objective, we will compare how the users deal with an explanation where simplifying model transformations are applied as opposed to one where the entire model is given to the user. We will compare the e!ectiveness of an explanation defined over a model containing all four transformations against Algorithm 2 Stratified Search one where one of the transformation operations is missing. For the secondary objective, we will test how the removal of any single transformation a!ects the overall e!ectiveness of the explanations. We recruited a total of 180 participants through Prolific <ref type="bibr">[38]</ref>. The participants were paid $3.30 for 15 minutes. The study took the form of a timed quiz (the quiz is automatically submitted at 15 minutes) where they read an explanatory dialogue and were asked to answer questions related to the explanation. There was also a bonus of $5 o!ered to the top two fastest participants from a group who get all the answers right (thus ensuring that people optimize for both completion time and correctness). We required that the participants be fluent in English and it was their first language.</p><p>For the maximum education degree completed: 30% of all the participants who attempted the test (including those who dropped out in the middle) reported having a Bachelor's degree, 18% a high school degree, 17% having some college credits, and 15% having a master's degree. In the group corresponding to all-but-determinization, 88% of participants reported that they understood probabilities. The study was performed on a simple travel planning domain, where the goal is to let the person board a flight. The domain consists of 11 actions, four of which have stochastic e!ects. We start with a model that contains all the transformations (identified by running an exhaustive search over the transformation space while using the same solvers as in Section 7.2 as proxies for Comp H (,)). It projects out all but six propositional fluents, has a subgoal of getting to the terminal, is determinized, and uses a regression-based localapproximation method to prune out actions that can't possibly contribute to the goal of getting to the terminal. The actual explanatory dialogue consists of a system stating the estimated time taken to reach the final destination (time being a stand-in for cost) and the user in the dialogue asking why it can't be done in an hour. The explanation consists of a description of the corresponding simplified model (generated by filling templates with descriptions of the propositions) and a single execution trace as the explanatory witness. The user is asked to read this dialogue and asked to answer a series of questions. Two questions of particular interest are a filter question that just checks whether the participant read the instructions and then a question that tests whether the user understood the explanation, by asking how they could speed up the travel plan. For the second question, the participant was given five options, only two of which are correct answers. One of which requires the participant to understand the current plan and the second requires them to reason about an alternate initial state. We ignored any participant who got the filter question wrong or whose quizzes timed out. The description of all the models can be found in Appendix A2.</p><p>The six conditions we considered were 1. all -A condition consisting of a model that includes all four transformations, here the main explanatory text (model description plus explanatory witness) included 1493 characters.</p><p>2. all-but-determinization -as the name suggests, all transformations except determinization are applied; the explanatory text included 1805 characters.</p><p>3. all-but-local-approximation -the explanatory text includes 2274 characters.</p><p>4. all-but-decomposition -the explanatory text includes 2429 characters.</p><p>This group has the same model description as all-but-local-approximation, but the explanatory witness is longer (as opposed to just talking about reaching terminal A, the trace takes you all the way to boarding the flight).</p><p>5. all-but-abstraction -the explanatory text includes 3211 characters.</p><p>6. None -this was the baseline group where the user was exposed to the original model and the explanatory text includes 4596 characters.</p><p>We considered 30 participants per condition.</p><p>We measure the e!ectiveness of the explanation both on a subjective level (do people find the explanation satisfying?) and on an objective one (do people find the explanations helpful?). We measure the former by directly asking the participants if they found the answer to the question (i.e., the explanation) satisfying and the latter by checking whether they found correct answers and how long they took to find those answers. Table <ref type="table">1</ref>, presents the results of the user study. The table lists -the number of participants who submitted the answers in time and provided the correct answer to the filter question, the percentage of participants who said they were satisfied with the answer, the number of participants who selected at least one correct answer, the average time taken by participants who answered at least one correct answer (reported along with 95% confidence interval) and the p-value obtained from a T-test comparing the time taken by the group that considered all the transformations against each of the other groups.</p><p>We ran a one-way ANOVA to compare the time taken by the six groups. Here, the null hypothesis was whether there was a di!erence between the groups. We found the p-value to be 0.00268959, which allows us to reject the null hypothesis since it is much lower than the significance level we considered (&#1009; = 0.05). This establishes that the di!erent model transformation does, at the very least, change the time taken.</p><p>Our first objective was to determine whether the transformation does provide an advantage over providing the original model information. To test this, we compare the None group against the others, particularly, with all. To start with, we see that None took the most time per the average. This supports our hypothesis about model simplification providing an advantage. The p-value calculated from the t-test (which can be roughly interpreted as the probability that the samples come from the same population) is 0.00042, which is lower than the standard significance levels (&#1009; = 0.05) used to establish statistical significance. This means we can be confident that the data for the None group is di!erent from those drawn for all group. We also see a pretty drastic reduction in the number of participants who got a correct answer. All these evidences point to the fact that, from an objective point of view, the model simplification transformation does provide an advantage.</p><p>On the other hand, we also see the highest level of satisfaction for this group.</p><p>One potential explanation for this result could be the fact that the participants were impressed by the verbosity of the explanation, even though it may have significantly slowed down and even mislead the participant. This results further reinforces arguments made by various recent works (cf. <ref type="bibr">[39]</ref>) that subjective satisfaction may be a poor indicator of explanation e!ectiveness. Now in regards to the second objective, we see how the removal of each of the individual transformations has some impact on the time taken. Both the groups where abstraction and local approximation were removed reported statistically significant results in regard to the completion time. In terms of the number of participants who correctly provided an answer, we saw the most drastic drop in the group where problem decomposition was removed. Also, it's worth noting that the time taken by the participants isn't simply a function of the verbosity of the explanation. For example, while all-but-abstraction is much more verbose than all-but-local-approximation, participants in the former group took considerably less time than the latter.</p><p>Another interesting point is the di!erence between all-but-local-approximation and all-but-decomposition. They are nearly identical in the size of the explanation, but in one, you are asked to reason about a longer horizon (i.e., all-but-decomposition), and in the other, you have unnecessary actions that don't contribute to the goal being explained against (i.e., reaching the terminal). We see that in all-but-local-approximation, a lot of people can solve the problem but at the cost of much higher time (the p-value returned by the statistical test between the time taken under condition all and all-but-localapproximation is again under the significance level of 0.05). We believe the drop in the number of participants with correct answers could be explained by the increased cognitive demand imposed by the need to reason over a larger horizon. As such, the need to reason over a longer horizon must have caused the participants to draw the wrong conclusions. This could again point to the deficiency of verbosity as being the sole metric determining the e!ectiveness of an explanation.</p><p>Overall Takeaways. The results show the utility of model simplifying transformations. The transformations that made the biggest impact on the overall time where all-but-abstraction and all-but-local-approximation. However, all-but-local-decomposition resulted in a large drop in the number of users who could find the correct answers. Also, the e!ectiveness of explanations isn't just reliant on model-description sizes or verbosity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2.">Synthetic Experiments</head><p>Here we will look at the ability of Stratified search to generate simplified models. Here the performance of the simplified model is measured by the time taken to solve the problem using standard solvers and the size of the description. We considered three di!erent solvers, a MAXPROB solver <ref type="bibr">[29]</ref>,</p><p>an implementation of LAO* solver <ref type="bibr">[40]</ref> for SSP problems (for the cost queries) and the fast-downward planner (ran with A* search and LM-cut heuristic) for deterministic problems <ref type="bibr">[41]</ref>. The problems were tested on problems from IPPC problems from 2006 and 2008 <ref type="bibr">[42]</ref> and some additional problems for unsolvability.</p><p>We tested problems corresponding to all three criteria classes. For criterion corresponding to cost (i.e., &#977; &#949;1 ), we only consider domains which is guaranteed to have proper policies (i.e., goal achievement probability is 1). For each domain considered, we selected only problems that could be solved by the solvers within 30 minutes and was appropriate for the specific criteria (i.e. was unsolvable for &#977; &#949;2 and had the max probability of less than 1 for &#977; &#949;3 ). Since we are unaware of any unsolvability benchmarks (for criterion &#977; &#949;2 ) for probabilistic planning,</p><p>Criterion Domain Problem Count Original Prob Size Average Simplified Prob Size Average Solver Time of Original Model Average Solver Time of Simplified Model Average Search Time &#982;&#969; 1 Blocksworld 5 67.4 25.4 1.647 1.22 96.81 Elevators 12 75.36 7.86 58.01 1.09 9.69 Zenotravel 5 81 26.2 534.06 1.26 11357.21 &#982;&#969; 2 Bottleneck 12 145 116.25 60.38 1.62 56.09 &#982;&#969; 3 Horizon Constrained Blocksworld 5 115 63.2 1.03 0.88 6476.61 Drive 15 421.53 51 0.06 0.13 1.28 Exploding Blocksworld 9 77.44 19.11 53.59 0.17 11.64</p><p>Table 2: The results of the simulated experiments</p><p>we took a deterministic domain (from <ref type="bibr">[43]</ref>) and turned them into probabilistic domains by randomly changing some of the add e!ects to stochastic e!ect with probability 0.5. For &#977; &#949;1 ) we created the query by considering the cost threshold to be 5 (note the SSP solver ignores action cost) and for &#977; &#949;3 ) we used a probability threshold of 1. We also used a constrained version of Blocksworld domain for &#977; &#949;3 that was introduced by <ref type="bibr">[29]</ref>.</p><p>As clear from Table <ref type="table">2</ref> shows, in every domain, the transformation results in a smaller domain, and in all but the driver domain, it results in a shorter solution time. Table <ref type="table">3</ref> presents the average time taken by di!erent types of transformation across the various unique domains. One thing to note is that the transformation that takes the most time is the problem decomposition. This is because it involves finding potential landmarks and finding the one that meets the required property from the partially ordered landmark.</p><p>Table 2 also presents the average time taken by the stratified search on each domain. All experiments were run on an Ubuntu 14.04 machine with 12 cores and 64 GB RAM. Domain Transformation Average Time Taken (Secs) Blocksworld F F \! 0.00107 F " 0.002305 F L 0.000331 F D 9.518883 Elevator F F \! 0.000712 F " 0.000758 F L 0.000282 F D 5.780696 Zenotravel F F \! 0.001473 F " 0.001899 F L 0.000358 F D 3.277179 Bottleneck F F \! 0.000781 F " 0.000637 F L 0.000412 F D 21.500897 Drive F F \! 0.002310 F " 0.004960 F L 0.000334 F D 1.124373</p><p>Table 3: Average time taken for the individual transformations across unique domains.</p><p>has been widely recognized as a fundamental process within the field <ref type="bibr">[76]</ref>. Each of our transformations is e!ectively mapping one transition system to another.</p><p>Such problems have been studied in the context of multiple fields including formal methods <ref type="bibr">[77]</ref>. In many of these cases, the mapping may be selected so as to enforce certain properties by ensuring the mapping is a bisimulation <ref type="bibr">[78]</ref>, homomorphism <ref type="bibr">[79]</ref>, etc. Control-theoretic literature also has studied state aggregation as investigated within this paper <ref type="bibr">[80]</ref>. Per <ref type="bibr">[76]</ref>, one of the earliest planning systems to use abstractions was the ABSTRIPS <ref type="bibr">[81]</ref> system which again connects to the state abstraction presented here. Since then planning methods have considered multiple state-based <ref type="bibr">[82]</ref> and temporal abstractions <ref type="bibr">[83]</ref>[84].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9.">Conclusion and Discussion</head><p>The paper presents a framework for generating a simplified model representation for the purposes of explanation. In particular, we look at transformations over model descriptions that preserve explanatory criteria. We focus on explaining policies generated using (a) factored MDP models that satisfy P assumption and (b) with respect to some contrastive query. As part of defining this framework, we also establish the space of possible explanatory criteria that can be queried by the user in this setting, perform analysis over some general class of transformations, and formalize the idea of explanatory witness. We perform user studies to validate the specific transformations studied in the paper. Our user study results show that the transformations do help improve the comprehensibility of the explanations. However, we can't just rely on computational intuitions to decide the most useful transformations. Thus, more work needs to be done to identify these transformations' strengths and develop novel transformations better suited for explanations.</p><p>:rewards :equality :typing) (:types elevator floor pos coin) (:predicates (at_home) (has_cash_voucher) (on_board_express_train) (on_board_regular_train) (at_downtown_station) (at_terminal_A) (at_city_station) (checked_in) (luggaged_checked_in) (at_deignated_gate) (board_flight) (gate_confirmed)) (:action walk_to_train_station :parameters () :precondition (and (at_home) ) :effect (and (decrease reward 15) (at_city_station) (not (at_home)) ) ) (:action take_taxi_to_downtown :parameters () :precondition (and (at_home) (has_cash_voucher) ) :effect (and (decrease reward 15) (at_downtown_station) (not (at_home)) (not (has_cash_voucher)) ) ) (:action catch_train_at_city_station :parameters () :precondition (and (at_city_station)) :effect (and (decrease reward 25) (not (at_city_station)) (probabilistic 1/2 (and (on_board_express_train) ) 1/4 (and (on_board_regular_train))) ) ) (:action ride_express_train :parameters () :precondition (and (on_board_express_train)) :effect (and (decrease reward 50) (probabilistic 1/2 (and (at_downtown_station) (not (on_board_express_train)))) ) ) (:action ride_regular_train :parameters () :precondition (and (on_board_regular_train)) :effect (and (decrease reward 85) (probabilistic 1/4 (and (at_downtown_station) (not (on_board_regular_train)))) ) ) (:action catch_shuttle_from_downtown_station_to_terminal_a :parameters () :precondition (and (at_downtown_station) (has_cash_voucher)) :effect (and (decrease reward 15) (not (has_cash_voucher)) (not (at_downtown_station)) (at_terminal_a)) ) (:action check_in_terminal_A :parameters () :precondition (and (at_terminal_A)) :effect (and (decrease reward 15) (not (has_cash_voucher)) (checked_in)) ) (:action check_in_luggage :parameters () :precondition (and (at_terminal_A) (checked_in)) :effect (and (decrease reward 15) (luggaged_checked_in)) ) (:action wait_for_Gate_confirmation :parameters () :precondition (and (at_terminal_A) (checked_in)) :effect (and (decrease reward 15) (probabilistic 1/2 (and (gate_confirmed))) )) (:action go_through_security_clearance_to_gate ;; Possible place to add more stochasticity :parameters () :precondition (and (at_terminal_A) (luggaged_checked_in) (gate_confirmed) (checked_in)) :effect (and (decrease reward 5) (at_deignated_gate)) ) (:action board_flight :parameters () :precondition (and (at_deignated_gate)) :effect (and (decrease reward 5) (board_flight))</p><p>) )</p></div></body>
		</text>
</TEI>
