<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Experimental design and facets of evidence for computational theory of mind</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10347957</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the 8th International Workshop on Artificial Intelligence and Cognition (AIC)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>J. Michelson</author><author>D. Sanyal</author><author>J. Ainooson</author><author>Yang Y.</author><author>Kunda M.</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The competitive feeding paradigm is one of several experimental setups intended to test whether nonverbal subjects possess skills related to Theory of Mind. Competitive feeding focuses on the relationship between seeing and knowing. In this paper, we describe a highly-customizeable implementation of the competitive feeding paradigm for computational agents in a gridworld environment. We explore various modifications to the setup including shared rewards, alternate sequences of timed events, and asymmetrical values, that allow us to replicate a wide breadth of tests designed to study the social cognition skills of humans and animals. Finally, we describe how this paradigm can be expanded upon and used as a benchmark test to investigate social reasoning in artificially intelligent models.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>One critical element of social cognition research is Theory of Mind (ToM), described originally by Premack and Woodruff in 1978 as a "system of inferences" regarding the mental states of others <ref type="bibr">[1]</ref>. Specifically, mental states, which are unobservable, may only be inferred to both exist and relate to observable data. Because of their subjective nature, ToM skills and the mechanisms that produce them-in humans and other animals-are not thoroughly understood. Their detection and measurement has been and remains the subject of a lengthy ongoing debate. A well-studied example of potential ToM reasoning in the animal kingdom is that of Western scrub-jays, who instinctively cache their food to save it for later. They tend to re-cache their food if they believe their behavior was observed by a competitor, who might try to pilfer the hidden prize. In doing so, they keep track of which individual witnesses are privy to information about different cache sites <ref type="bibr">[2]</ref>. At first glance, such sophisticated behavior seems to imply that the jays are capable of inferring other competitors' mental states. By careful observation, however, it becomes apparent that directly observable information, e.g. "Polly's head was oriented towards this particular cache in the past" is sufficient for a successful re-caching strategy, without need for any mentalization, e.g. "Polly knows there is food here". Jays also display some degree of successful transfer between the roles of hiding and seeking: birds which have been thieves are more likely to re-cache their food when observed by competitors <ref type="bibr">[3]</ref>. Could this pattern be evidence for experience projection? Although it is interesting behavior, this observation also 8th International Workshop on Artificial Intelligence and Cognition, June <ref type="bibr">[15]</ref><ref type="bibr">[16]</ref><ref type="bibr">[17]</ref><ref type="bibr">2022</ref>, &#214;rebro, Sweden fails to provide strong evidence for any reasoning about competitors' internal states, as it can be explained by ToM-free models <ref type="bibr">[4]</ref>.</p><p>While most literature on ToM focuses on humans and non-human animals, there exists a wealth of knowledge to be questioned, tested, and discovered in the realm of artificial intelligence. Michelson et al. <ref type="bibr">[5]</ref> highlight the need for a standardized battery of tests that can be used by many to evaluate AI models' theory of mind skills. They describe several criteria and desiderata that make social cognition benchmark tests amiable to artificial intelligence researchers. Numerous tests of animal cognition examine ToM and related skills, including the popular Sally Anne test <ref type="bibr">[6]</ref>, knower guesser paradigm <ref type="bibr">[7]</ref> [8], and competitive feeding paradigm <ref type="bibr">[9]</ref>. The text of this paper covers the design, implementation, and use-cases of one such test environment-inspired by the competitive feeding paradigm-that serves as a foundation for such a test battery. The specific contributions of this paper include:</p><p>&#8226; A brief overview of the competitive feeding paradigm, a test framework designed to study whether non-verbal animals understand concepts of seeing and knowing, as well as its criticisms.</p><p>&#8226; A detailed description of the Standoff environment, a gridworld framework for running social cognition tests on computational agents.<ref type="foot">foot_0</ref> &#8226; Descriptions of how various specific modifications of competitive feeding under the Standoff framework allow for the measurement of a breadth of skills beyond those captured by competitive feeding.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>Povinelli and Vonk <ref type="bibr">[10]</ref> point out the failure of existing paradigms for testing social cognition in that these tests generally do not distinguish reasoning about observable behavior from reasoning about unobservable mental states. Later, Penn and Povinelli provide a formalized definition of ToM so that its presence in a subject can be more systematically measured and falsified <ref type="bibr">[11]</ref>. They describe ToM as the presence of a function, &#119891; &#119879; &#119900;&#119872; , which a cognitive agent (the subject) may use to infer the mental state of another cognitive agent. As &#119891; &#119879; &#119900;&#119872; is an inference, its output must be based solely on the perceptual inputs available to the subject. This definition avoids any specific interpretations of how &#119891; &#119879; &#119900;&#119872; might be implemented or used. Compelling evidence of &#119891; &#119879; &#119900;&#119872; must be in the form of behavior that demonstrates "the necessity of an &#119891; &#119879; &#119900;&#119872; in addition to and distinct from the cognitive work that could have been performed without such a function. " The competitive feeding paradigm, which we describe in 2.1, is used by Penn and Povinelli as a case study for its inability to detect ToM <ref type="bibr">[11]</ref>. With a few modifications, however, a new paradigm can be built that satisfies the requirements for proving and falsifying ToM hypotheses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">The Competitive Feeding Paradigm</head><p>The competitive feeding paradigm is a test setup designed to distinguish whether a non-verbal subject will change its behavior to account for what it believes a conspecific knows, based on evidence relating to what the conspecific sees <ref type="bibr">[9]</ref>. The subject and one other participant must have an established social hierarchy, with the subject being 'subordinate' to the other 'dominant' participant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.1.">Setup</head><p>The general setup of a competitive feeding test is as follows: The animals are kept in cages on either side of a central room, the subject's cage always opposite its one or more conspecifics' cages. During "baiting" events, large and small food rewards, or treats, are placed or moved in the central room. Although the placement of the treat is sometimes visible to the dominant, after one or both baiting events occur, the dominant is no longer able to see the treat. Eventually, both animals are released. Due to the nature of the social hierarchy, the subordinate will not challenge the dominant if the two would attempt to reach the same treat. So, if the subject believes the dominant will look for food in a particular location, we assume the subject will avoid that location. The subject's initial challenge, then, is determining where the dominant will decide to go. Once released, the subject's orientation or movement towards a treat is recorded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.2.">Baiting events</head><p>During the baiting events, the dominant's door might be partially open, allowing it to see the baiting, or closed. By closing the dominant's door at specified times, researchers create scenarios in which it knows where the food is, it does not know, or it has a false belief about the food's presence or location (i.e. it knows where the food is initially, but is then unaware that the food has been moved).</p><p>By carefully observing what the dominant can and cannot see and then reasoning about what the dominant knows, the subject might choose to alter its behavior to secure more food for itself. For example, if the subject believes that the dominant does not know the larger food pile's location, the subject might try going there for a greater reward, when it would otherwise leave the pile to the dominant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.3.">Variants</head><p>Since its first use testing chimpanzees <ref type="bibr">[9]</ref>, multiple variants of competitive feeding have been proposed, implemented, and run on various animal species. Hare et al. published a compelling version of the test in 2001 featuring three experiments: "did", "who", and "which", referring to the subjects' beliefs about whether conspecifics witness different baiting events <ref type="bibr">[12]</ref>. "Did" refers to the ability to distinguish whether an opponent did or did not observe an event, "who" involves understanding who of multiple opponents observed an event, and "which" involves understanding which of multiple baiting events an opponent observed. That test, and most following it, compare the subject's performance across at least four conditions: Informed, Uninformed, Control Misinformed, and Misinformed. The names of these variants refer to the dominant's awareness of baiting events. In the former two setups, one baiting event takes place, and the dominant is either aware or unaware of the food's location. In the latter two, the dominant is aware during one baiting event, but then is either aware or unaware of a second in which the foods' locations are swapped. The misinformed and control misinformed cases can be likened to the Sally Anne test, as the subject is tasked with identifying the presence of a change-of-location false belief.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Criticism</head><p>Because our subject has access to its own mental state, it is of critical importance to falsify the null hypothesis that it makes use of only its own mental state to determine its behavior. Behaviors that could be explained as a learned response to superficial perceptual input, e.g 'her eyes being pointed toward the food indicates that I should go somewhere else', do not suffice.</p><p>In general, ToM allows an agent to behave as though some other portion of the environment (read: another embodied agent) is expected to behave in accordance with a false belief. To be convinced that the agent has ToM, its behaviors under all alternate assumptions of truth values (and beliefs about truth values) must be known and compared. Penn and Povinelli describe two different alternatives to the competitive feeding paradigm that might aid in making such a comparison.</p><p>The first, called the opaque visor experiment, is a modification of a task described in <ref type="bibr">[13]</ref>. The opaque visor experiment involves explicit generalization from novel first-person experience to third-person reasoning: the subject is given time to experiment with multiple visors, the opacity of which is only visible with physical proximity, before being evaluated about the visor's effect on an experimenter at a distance. Due to its emphasis on few-shot learning, the opaque visor experiment lies beyond the scope of this paper. The second, which motivates this work, adds a handful of modifications and variants meant to control for alternate explanations in animals' 'passing' behavior to Hare et al. 's competitive feeding paradigm <ref type="bibr">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">Systematic Competitive Feeding</head><p>The improvements Penn and Povinelli suggest for a systematic competitive feeding paradigm (SCFP) are slightly more complex, but provide much more satisfying answers to questions of what, exactly, the subjects believe. To allow for satisfactory presumption of agents' behavior, they describe a specific training regime featuring steps that must be passed successfully, representing successful understanding of the test's fundamental components.</p><p>In Stage 1, subjects are trained in the absence of dominant competitors until they demonstrate proper goal-seeking behavior. Next, in Stage 2, they are trained to compete with a conspecific (as in all other competitive feeding tests) for food, and only those who successfully concede food to dominants are allowed to continue. If our subjects pass the first two stages, we can be certain that they understand the basics of how their reward can be maximized.</p><p>Finally, several variants are presented as test conditions in Stage 3. In this version, there are several buckets (food locations), and food is always placed in two of them during the baiting events. Because the number of buckets is usually greater than 2, the SCFP makes no cross-experiment distinction between the "did" and "which" cases of Hare et al. <ref type="bibr">[12]</ref>.</p><p>Instead of the four common test variants described above, the SCFP uses at least eight scenarios to comprehensively judge the subjects' understanding: Informed control, partially uninformed, removed informed, removed uninformed, moved, replaced, misinformed, and swapped. Like the four common competitive feeding variants, these scenarios differ from each other only by schedules of obscuring, baiting, hiding, and releasing events, performed by the experimenters. For full descriptions of each scenario, please refer to section 6b of Penn and Povinelli 2007 <ref type="bibr">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The Standoff Environment: A Gridworld Platform for Computational Theory of Mind Experiments</head><p>The Standoff Environment is a multiagent gridworld environment implemented as a partiallyobservable Markov decision process using the PettingZoo API <ref type="bibr">[14]</ref>. SuperSuit <ref type="bibr">[15]</ref> wrappers convert the environment's inputs and outputs into formats which can interface directly with off-the-shelf reinforcement learning paradigms in Stable-baselines3 <ref type="bibr">[16]</ref> and RLlib <ref type="bibr">[17]</ref>. Standoff replicates all SCFP variants as described in <ref type="bibr">[11]</ref>, and, as we will see, is capable of testing for ToM skills in a wide variety of settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Agents</head><p>Agents' views are bird's-eye representations of their surroundings. These views are either egocentric, in which the agent's body always appears in the same relative location, and orientation is aligned with the agent's current direction, or allocentric, in which the entire world is displayed with a uniform coordinate system, but areas outside the agent's perception are masked. In both cases, our agents' bird's-eye perceptions are notably different from real animals' first-person views, but we opt for the simpler and possibly easier perspective for the sake of programmer friendliness. Agents' action sets include movement of either of two kinds: directed (forward, backward, rotate left, and rotate right) and cardinal (North, South, East, and West).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Puppets</head><p>The Standoff environment supports multi-agent reinforcement learning, but its initial intent is studying the behavior of a single subject. As a starting point, the subject's conspecifics, be they collaborators or competitors, are implemented as hard-coded puppets. These puppets behave according to simple rulesets applied to their perceptions. Puppets appear identically to any agent-subject included-other than optional visual features that distinguish their values (see 4.6). Puppets have an explicit memory of relevant information that they witness (namely, treat locations), as well as basic navigation skills. Through this dynamic implementation, changing the sequence of information presented to a puppet causes predictable changes in its behavior.</p><p>Various independent variables and environmental parameters can be edited to create different experimental conditions, to which puppets respond automatically. Puppets' behavior can be otherwise specified by the user to any degree of granularity, and they can even be controlled by custom artificially intelligent models. Note that the puppets' hard-coded behavior is intended as a starting point in absence of rational actors, though irrational behavior also warrants study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Tutorial Stages</head><p>All tests of social cognition are based on a number of assumptions about their subjects' goals, knowledge, and abilities. Animal subjects' preferences for food are well-understood, and fundamental knowledge-like that doors open and close, or how to navigate simple environments-can generally be assumed without question or otherwise taught with repeated exposure. The Standoff environment makes use of numerous 'tiles' with various behaviors and affordances that, at evaluation time, the subjects are assumed to understand. Curtains and boxes conceal their contents, treats grant rewards, gates (both transparent and opaque) open and close without warning, and other agents move about of their own volition. These 'commonsense' facts (along with many others) are established in the environment's provided tutorial stages, which expose a subject to various hardcoded and randomized settings so that it can explore the rules of the world, which imitate those of the other Standoff conditions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Competitive Feeding</head><p>As a starting point, we shall introduce the Standoff implementation of competitive feeding, including all the systematic variants proposed by Penn and Povinelli <ref type="bibr">[11]</ref>. In our computational version of the test paradigm, we closely imitate many aspects of the competitive feeding design: walls are opaque (occluded areas are masked by a special shadow color), gates may be opened or closed (and opaque or transparent), and treats are baited according to the same schedules.</p><p>Because the environment is a gridworld, many of these details are abstracted by a large degree. Treats are objects that provide reward to reinforcement learning agents (and often terminate the episode) when reached. The rewards granted by treats are dynamically determined following certain rules to ensure predictable optimal behavior. For example, if there are &#119899; boxes, the ratio of (positive) rewards between the larger and smaller treat must be greater than &#119899; : 1, otherwise strategies like "always approach the smaller treat" become valid strategies for achieving maximal total reward under undesirable circumstances. Likewise, a small negative reward placed on empty boxes reduces the expected value of random guesses. Treats may be 'hidden' in boxes, Figure <ref type="figure">2:</ref> A partially uninformed scenario of the systematic competitive feeding paradigm showcased in the Standoff environment, pictured at selected timesteps. Both dominant puppet (top red triangle) and subordinate subject (bottom red triangle) begin with full views of the environment, and have movement impeded by transparent blocks (step 0). In view of both agents, the first treat, a small green circle, is baited (step 4) and hidden in a box (step 5). The dominant's view is obscured with an opaque block before the second, larger, treat is baited (step 9) and hidden alongside the decoy boxes at each treat location (step 10). Note that the dominant is occluded from the subject's view while the opaque block is present in steps 9 and 10. Both agents are released to the curtained area (red squares) to make decisions in private. Upon the second release event (step 17), the dominant progresses towards the small treat's location since it is unaware of the large treat's location, so, to the best of its knowledge, obtaining the small treat maximizes its expected value.</p><p>which obscure vision of the treat from both the dominant and (conditionally) the subordinate. Note that in the original competitive feeding paradigm design, the subordinate is always able to observe the treats' locations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Independent Variables</head><p>By modifying various small sets of environmental parameters, we may computationally imitate many other social cognition experiments that have been performed on animals. Models' skills may be tested with a variety of lenses to gain insight into their fundamental capabilities and weaknesses.</p><p>All of the following variables can (and should) be investigated for transfer learning in a standardized manner. Then, models of ToM may be evaluated for their generalization capacity along various notable axes. As with scrub-jays, does experience with one role help an agent understand another? The test setups may be studied as well to find the extent to which success at a specific set of tests tends to predict other abilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Agent priority</head><p>In the standard competitive feeding tests, the subject is subordinate to its opponent, i.e. all else equal, the subject is at a competitive disadvantage. In the Standoff environment, this effect is achieved and signaled to the agents via treat locations; they are slightly closer to the dominant agent. On its own, allowing the subject to take on the role of the dominant invokes a trivially easy task, having an identical solution to Stage 1 of SCFP. In conjunction with other changes that we shall discuss below-especially visible decisions (see 4.3)-a dominant subject proves quite useful, as its decision can alter the subordinate puppet's behavior.</p><p>A transfer learning experiment of differing agent priorities is similar to the role-reversal experiment in <ref type="bibr">[18]</ref>. Their experiment is of a collaborative nature, so note that the same experiment could be run under different conditions for anticipation valence (see 4.2) and reward sharing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Anticipation valence</head><p>In the competitive feeding tests, the subject is expected to look for a treat in an area where it believes the dominant will not visit. Leslie and Polizzi find a significant difference between positive and negative desires, that is, looking where something is versus is not located, in the context of Sally Anne tests in human children <ref type="bibr">[19]</ref>. A minor change in the rules of the Standoff environment inverts the negative valence in the competitive feeding paradigm: when the dominant reaches a treat, the treat shall remain and its value for the subordinate shall be increased to be maximal. Now, the task presented to the subject is arguably simpler: infer the dominant's goal, and adopt that goal as your own. There is no longer a need for extraneous decision-making regarding selecting the best goal alternative once the dominant's decision has been identified. Many other social cognition experiments, including most that involve collaboration, make use of positive anticipation.</p><p>We signal valence using treats' color for RGB inputs, and treats' identity for rich inputs. If the subject has the dominant priority (see 4.1), positive valence is achieved via reward sharing, that is, the subject is rewarded for a subordinate puppet's successful completion of the task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Decision visibility</head><p>While evaluating all competitive feeding tests, it is of critical importance that the dominant (and, if the dominant has ToM, the subordinate) be given privacy while it decides which route to take. Otherwise, one agent could use the behavior of the other to inform its decision-a strategy that is clearly relevant to social cognition but interferes with our tests for attribution of already-established beliefs.</p><p>Allowing subordinate agents to make decisions while informed of other agents' decisions opens the possibility of testing imitation and emulation. By allowing the subordinate to view the dominant's decision before the decision is finalized, we can study the subordinate's ability to imitate (or avoid imitating, in the negative anticipation valence case). When the subordinate and dominant have differences in their abilities (be they perception, mental, or action), imitation may be directly compared with emulation. For example, a subject (occupying an empty room) emulating a teacher (slowed by clutter) could navigate the room more efficiently than the teacher, as opposed to an imitating subject who would inefficiently copy the teacher's behavior.</p><p>When a dominant subject' decisions are visible, it might behave in a manner that strategically influences the subordinate puppet's decision. In the shared reward, positive anticipation version of this test, the subject's goal is to lead its conspecific to the treat. This altruistic variant, especially in conjunction with multiple value alignments, roughly evokes the Yummy-Yucky test described by <ref type="bibr">[20]</ref>, in which a subject is tasked with using knowledge of preferences to assist an experimenter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Population size</head><p>An agent might solve the SCFP as defined by labeling events as 'seen by opponent' or 'unseen by opponent'. In this case, although an opponent's perception must be correctly inferred, it is unclear whether an &#119891; &#119879; &#119900;&#119872; compartmentalizes the knowledge of a single opponent. In other words, we might pass all SCFP tests while operating under the assumption that all embodied opponents have a shared mental state. Note that this assumption could be correct in cases where opponents communicate with each other. In order to rule out this hypothesis, we must test for the "who" ability.</p><p>By increasing the population of puppets (each having individual vision-obscuring events), the subject may only find success by keeping track of who sees each baiting event. To accomplish this effect, multiple puppets are initialized, each in a separate starting room. Any number of puppets might be informed during the baiting events. During the release event, only one of the puppets is able to leave its cage. To pass these scenarios, the subject must determine whether or not the released puppet specifically was made privy to the pertinent information. In scenarios with more than one baiting event, the "informed" agent may or may not be informed of the irrelevant event(s). In conjunction with positive anticipation and visible decisions, the Standoff task becomes similar to the knower guesser paradigm <ref type="bibr">[7]</ref>, another popular test of social reasoning in animals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Obscuring source</head><p>In the competitive feeding paradigm (real-life and Standoff), participants' vision is obscured using opaque doors that occlude baiting events. These doors may be replaced by one of any existing objects that have been established to be opaque (or not) during the agent's training. Numerous other methods may be devised for causing (and signaling) unawareness. Gaze, for one, has been extensively studied in humans and animals. By instructing puppets (with directional vision) to face away from the food during baitings, we can evoke a rudimentary replication of experiments involving gaze-originated unawareness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Value alignment</head><p>A core assumption of previously described experiments is that all agents value treats similarly, yet a fundamental ToM skill involves empathizing with individuals with different preferences. We provide two alternative sets of preferences inspired by <ref type="bibr">Leslie and Polizzi 1998 [19]</ref>: A negative-value agent prefers smaller treats to larger ones, and an avoidant agent prefers to search boxes that contain no treats at all. Like anticipation valence, value alignment is signaled by agents with alternate color or numeric identity schemes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.7.">Scenario complexity</head><p>Just as we would like to investigate our subjects' ability to compartmentalize their &#119891; &#119879; &#119900;&#119872; functions to multiple different embodied agents (or distinguish between multiple inferred mental states), we might also test the complexity of &#119891; &#119879; &#119900;&#119872; itself. Under what conditions, and to what extent, is it able to represent and distinguish between multiple goal states? In the multiple desires test <ref type="bibr">[21]</ref> children are tested to study their comprehension of three different aspects of multiple desires. We can imitate this test by releasing the subject only after the puppet reaches its first goal, giving the puppet a chance to also reach a second goal before the subject.</p><p>Memory robustness is a closely related, fundamental skill for successful attribution. By increasing the complexity of the environment, an agent's memory will need improvement to succeed. The environment's scale or the number of potential treat locations can be trivially increased to achieve this effect. We may also increase the amount of time between baiting and releasing, as well as the number of relevant and irrelevant events, to stymie our agents' efforts to retain relevant information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Future work</head><p>Although the Standoff environment can be used to systematically investigate a wide variety of skills, there are many aspects of social cognition that lie beyond its grasp. As mentioned previously, this environment is one of a set called for in <ref type="bibr">[5]</ref>. Much additional work remains to be done in the task that is building models that solve our social reasoning tests.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Independent variables not covered by Standoff</head><p>Further environments, likely with different fundamental setups, will be required to replicate the design of social reasoning tests from the comparative cognition and developmental psychology literature.</p><p>Several classes of social cognition tests are not easily represented in the Standoff environment. One notable example is the goggles test (see opaque visor test described in 2.2), which demonstrates projection from first-person experience <ref type="bibr">[11]</ref> <ref type="bibr">[22]</ref>. An environment capable of replicating this task would need to support both first-person viewpoints and memory sustained across repeated sessions to allow for testing one-and few-shot learning.</p><p>With significant modification, we hope to eventually cover a diverse set of tests which differentiate imitation and emulation. Just as the competitive feeding paradigm implementations make use of multiple vision-obscuring sources, tests of emulation include several sources of inefficient or unexpected conspecific behavior. These include irrationality or temporary inability <ref type="bibr">[23]</ref> [24], accidents <ref type="bibr">[25]</ref>, and even moral transgressions <ref type="bibr">[26]</ref>.</p><p>Of particular note are tests involving deception beyond that which is allowed in 'decision viewing' scenarios. Despite having the label of deception, these tests involve hiding and communicating treats' locations in both collaborative and competitive settings. The box-locking task, for example, asks its participant to aid or thwart a puppet by misinforming them or by physically preventing them from reaching their goal <ref type="bibr">[27]</ref>. Other tasks involving deceptive behaviors tend to require repeated sessions, including penny hiding <ref type="bibr">[28]</ref> and, as mentioned in 1, hiding belongings from onlooking competitors.</p><p>Similarly, we would like to point out that most of the inference the Standoff environment tests for is deductive in nature, although it is theoretically possible to test for abductive ToM reasoning. An accurate model of another agent's mental state should not only answer questions of what the agent will do, but should answer questions of how and why the agent displayed existing behavior. In the Standoff environment, how and why are generally answered by visible attributes of the environment, e.g. the opponent pursued the smaller goal because its body is colored blue and therefore experience dictates that it must have negative-value nature. This type of reasoning will likely prove necessary for successful one-and few-shot learning in ToM scenarios, a powerful but difficult skill to master.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">From generating baselines to solving the ToM riddle</head><p>The overall difficulty of the various Standoff tasks is an important question whose answer lies beyond the scope of this paper. Several environmental parameters are included for practical ease of implementation, e.g. allowing for allocentric perception and cardinal movement actions might help with agents' spatial memory, which is a complex skill in its own right.</p><p>Many researchers have already made substantial headway towards artificial ToM, including those with their own versions of social cognition tests mentioned above, for example Rabinowitz et al., who test their models on a gridworld implementation of the Sally Anne test <ref type="bibr">[29]</ref>. A wide variety of models and strategies have been employed, including deep reinforcement learning, Bayesian inference <ref type="bibr">[30]</ref>, and cognitive models <ref type="bibr">[31]</ref>. A review of algorithms designed for ToM reasoning can be found in Hernandez-leal et al. 2019 <ref type="bibr">[32]</ref>.</p><p>Competitive feeding subjects might lack proper understanding of their rivals' mental states, but we, as scientists, must empathize with their struggle. We, too, have a long journey ahead of us as we attempt to overcome our own lack of understanding, not just about mental states, but about how mental states are understood. By continuing along this path of tests, with foundations in comparative literature, we hope to help uncover the mysteries that allow us to understand.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The Standoff environment, along with instructions for generating the tests described in this paper, can be accessed at http://github.com/aivaslab/standoff</p></note>
		</body>
		</text>
</TEI>
