skip to main content


Title: Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task
Abstract

Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human–human teams, and human–artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents’ ability to infer participants’ knowledge training conditions and predict participants’ next victim type to be rescued. We evaluated ASI agents’ capabilities in three ways: (a) comparison to ground truth—the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.

 
more » « less
Award ID(s):
1828010
PAR ID:
10514498
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Cognitive Science Society
Date Published:
Journal Name:
Topics in Cognitive Science
ISSN:
1756-8757
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the natural world, Swarm Intelligence (SI) is a well-known phenomenon that enables groups of organisms to make collective decisions with significantly greater accuracy than the individuals could do on their own. In recent years, a new AI technology called Artificial Swarm Intelligence (ASI) has been developed that enables similar benefits for human teams. It works by connecting networked teams into real-time systems modeled on natural swarms. Referred to commonly as “human swarms” or “hive minds,” these closed-loop systems have been shown to amplify group performance across a wide range of tasks, from financial forecasting to strategic decision-making. The current study explores the ability of ASI technology to amplify the IQ of small teams. Five small teams answered a series of questions from a commonly used intelligence test known as the Raven’s Standard Progressive Matrices (RSPM) test. Participants took the test first as individuals, and then as groups moderated by swarming algorithms (i.e. “swarms”). The average individual achieved 53.7% correct, while the average swarm achieved 76.7% correct, corresponding to an estimated IQ increase of 14 points. When the individual responses were aggregated by majority vote, the groups scored 56.7% correct, still 12 IQ points less than the real-time swarming method. 
    more » « less
  2. Abstract

    Individuals with intellectual and developmental disabilities (IDD) face many barriers to meaningful inclusion, including limited language and communication skills. Professionals, such as speech-language pathologists (SLPs), can provide personalized instruction to promote skill development and inclusion. Providing opportunities for individuals to express preferences and choice, such as the multiple stimulus without replacement preference assessment (MSWO; DeLeon & Iwata 1996), within these programs, further increases skill acquisition and social interaction. However, limitations in professionals’ knowledge and skills in performing assessments can be another barrier to meaningful inclusion for individuals with IDD and traditional training methods can be challenging and time consuming. The purpose of the current study was to compare the use of artificial intelligence with traditional pen and paper self-instructional MSWO training methods for five preservice SLPs. Fidelity of implementation and duration of assessment were measured. Results demonstrated a large increase in implementation fidelity for two participants, a moderate increase for two participants and a slight increase for the remaining participant while using artificial intelligence. All participants demonstrated a decrease in scoring errors using artificial intelligence. Regarding duration of implementation, artificial intelligence resulted in a significant reduction for four participants and a moderate reduction for the remaining participant. Results of the follow-up survey suggest that all adult participants and both child participants found that artificial intelligence had a higher treatment acceptability and was more effective at producing socially significant outcomes than traditional methods. Recommendations for clinicians and future research are discussed.

     
    more » « less
  3. This study evaluated how a robot demonstrating a Theory of Mind (ToM) influenced human perception of social intelligence and animacy in a human-robot interaction. Data was gathered through an online survey where participants watched a video depicting a NAO robot either failing or passing the Sally-Anne false-belief task. Participants (N = 60) were randomly assigned to either the Pass or Fail condition. A Perceived Social Intelligence Survey and the Perceived Intelligence and Animacy subsections of the Godspeed Questionnaire were used as measures. The Godspeed was given before viewing the task to measure participant expectations, and again after to test changes in opinion. Our findings show that robots demonstrating ToM significantly increase perceived social intelligence, while robots demonstrating ToM deficiencies are perceived as less socially intelligent. 
    more » « less
  4. Team member inclusion is vital in collaborative teams. In this work, we explore two strategies to increase the inclusion of human team members in a human-robot team: 1) giving a person in the group a specialized role (the 'robot liaison') and 2) having the robot verbally support human team members. In a human subjects experiment (N = 26 teams, 78 participants), groups of three participants completed two rounds of a collaborative task. In round one, two participants (ingroup) completed a task with a robot in one room, and one participant (outgroup) completed the same task with a robot in a different room. In round two, all three participants and one robot completed a second task in the same room, where one participant was designated as the robot liaison. During round two, the robot verbally supported each participant 6 times on average. Results show that participants with the robot liaison role had a lower perceived group inclusion than the other group members. Additionally, when outgroup members were the robot liaison, the group was less likely to incorporate their ideas into the group's final decision. In response to the robot's supportive utterances, outgroup members, and not ingroup members, showed an increase in the proportion of time they spent talking to the group. Our results suggest that specialized roles may hinder human team member inclusion, whereas supportive robot utterances show promise in encouraging contributions from individuals who feel excluded. 
    more » « less
  5. In this paper we describe a virtual reality training simulation designed to help police officers learn use of force policies. Our goal is to test a training simulation prototype by measuring improvements to presence and performance. If successful, this can lead to creating a full-scale virtual reality narrative training simulation. The simulation uses a planner-based experience manager to determine the actions of agents other than the participant. Participants’ actions were logged, physiological data was recorded, and the participants filled out questionnaires. Player knowledge attributes were authored to measure participants’ understanding of teaching materials. We demonstrate that when participants interact with the simulation using virtual reality they experience greater presence than when using traditional screen and keyboard controls. We also demonstrate that participants’ performance improves over repeated sessions. 
    more » « less