Introduction This dataset was gathered during the Vid2Real online video-based study, which investigates humans’ perception of robots' intelligence in the context of an incidental Human-Robot encounter. The dataset contains participants' questionnaire responses to four video study conditions, namely Baseline, Verbal, Body language, and Body language + Verbal. The videos depict a scenario where a pedestrian incidentally encounters a quadruped robot trying to enter a building. The robot uses verbal commands or body language to try to ask for help from the pedestrian in different study conditions. The differences in the conditions were manipulated using the robot’s verbal and expressive movement functionalities. Dataset Purpose The dataset includes the responses of human subjects about the robots' social intelligence used to validate the hypothesis that robot social intelligence is positively correlated with human compliance in an incidental human-robot encounter context. The video based dataset was also developed to obtain empirical evidence that can be used to design future real-world HRI studies. Dataset Contents Four videos, each corresponding to a study condition. Four sets of Perceived Social Intelligence Scale data. Each set corresponds to one study condition Four sets of compliance likelihood questions, each set include one Likert question and one free-form question One set of Godspeed questionnaire data. One set of Anthropomorphism questionnaire data. A csv file containing the participants demographic data, Likert scale data, and text responses. A data dictionary explaining the meaning of each of the fields in the csv file. Study Conditions There are 4 videos (i.e. study conditions), the video scenarios are as follows. Baseline: The robot walks up to the entrance and waits for the pedestrian to open the door without any additional behaviors. This is also the "control" condition. Verbal: The robot walks up to the entrance, and says ”can you please open the door for me” to the pedestrian while facing the same direction, then waits for the pedestrian to open the door. Body Language: The robot walks up to the entrance, turns its head to look at the pedestrian, then turns its head to face the door, and waits for the pedestrian to open the door. Body Language + Verbal: The robot walks up to the entrance, turns its head to look at the pedestrian, and says ”Can you open the door for me” to the pedestrian, then waits for the pedestrian to open the door. Image showing the Verbal condition. Image showing the Body Language condition. A within-subject design was adopted, and all participants experienced all conditions. The order of the videos, as well as the PSI scales, were randomized. After receiving consent from the participants, they were presented with one video, followed by the PSI questions and the two exploratory questions (compliance likelihood) described above. This set was repeated 4 times, after which the participants would answer their general perceptions of the robot with Godspeed and AMPH questionnaires. Each video was around 20 seconds and the total study time was around 10 minutes. Video as a Study Method A video-based study in human-robot interaction research is a common method for data collection. Videos can easily be distributed via online participant recruiting platforms, and can reach a larger sample than in-person/lab-based studies. Therefore, it is a fast and easy method for data collection for research aiming to obtain empirical evidence. Video Filming The videos were filmed with a first-person point-of-view in order to maximize the alignment of video and real-world settings. The device used for the recording was an iPhone 12 pro, and the videos were shot in 4k 60 fps. For better accessibility, the videos have been converted to lower resolutions. Instruments The questionnaires used in the study include the Perceived Social Intelligence Scale (PSI), Godspeed Questionnaire, and Anthropomorphism Questionnaire (AMPH). In addition to these questionnaires, a 5-point Likert question and a free-text response measuring human compliance were added for the purpose of the video-based study. Participant demographic data was also collected. Questionnaire items are attached as part of this dataset. Human Subjects For the purpose of this project, the participants are recruited through Prolific. Therefore, the participants are users of Prolific. Additionally, they are restricted to people who are currently living in the United States, fluent in English, and have no hearing or visual impairments. No other restrictions were imposed. Among the 385 participants, 194 participants identified as female, and 191 as male, the age ranged from 19 to 75 (M = 38.53, SD = 12.86). Human subjects remained anonymous. Participants were compensated with $4 upon submission approval. This study was reviewed and approved by UT Austin Internal Review Board. Robot The dataset contains data about humans’ perceived social intelligence of a Boston Dynamics’ quadruped robot Spot (Explorer model). The robot was selected because quadruped robots are gradually being adopted to provide services such as delivery, surveillance, and rescue. However, there are still issues or obstacles that robots cannot easily overcome by themselves in which they will have to ask for help from nearby humans. Therefore, it is important to understand how humans react to a quadruped robot that they incidentally encounter. For the purposes of this video-study, the robot operation was semi-autonomous, with the navigation being manually teleoperated by an operator and a few standalone autonomous modules to supplement it. Data Collection The data was collected through Qualtrics, a survey development platform. After the completion of data collection, the data was downloaded as a csv file. Data Quality Control Qualtrics automatically detects bots so any response that is flagged as bots are discarded. All incomplete and duplicate responses were discarded. Data Usage This dataset can be used to conduct a meta-analysis on robots' perceived intelligence. Please note that data is coupled with this study design. Users interested in data reuse will have to assess that this dataset is in line with their study design. Acknowledgement This study was funded through the NSF Award # 2219236GCR: Community-Embedded Robotics: Understanding Sociotechnical Interactions with Long-term Autonomous Deployments.
more »
« less
Community Embedded Robotics: A Multimodal Dataset on Perceived Safety during Indoor Mobile Robot Encounters
Introduction As mobile robots proliferate in communities, designers must consider the impacts these systems have on the users, onlookers, and places they encounter. It becomes increasingly necessary to study situations where humans and robots coexist in common spaces, even if they are not directly interacting. This dataset presents a multidisciplinary approach to study human-robot encounters in an indoor apartment-like setting between participants and two mobile robots. Participants take questionnaires, wear sensors for physiological measures, and take part in a focus group after experiments finish. This dataset contains raw time series data from sensors and robots, and qualitative results from focus groups. The data can be used to analyze measures of human physiological response to varied encounter conditions, and to gain insights into human preferences and comfort during community encounters with mobile robots. Dataset Contents A dictionary of terms found in the dataset can be found in the "Data-Dictionary.pdf" Synchronized XDF files from every trial with raw data from electrodermal activity (EDA), electrocardiography (ECG), photoplethysmography (PPG) and seismocardiography (SCG). These synchronized files also contain robot pose data and microphone data. Results from analysis of two important features found from heart rate variability (HRV) and EDA. Specifically, HRV_CMSEn and nsEDRfreq is computed for each participant over each trial. These results also include Robot Confidence, which is a classification score representing the confidence that the 80 physiological features considered originate from a subject in a robot encounter. The higher the score, the higher the confidence A vectormap of the environment used during testing ("AHG_vectormap.txt") and a csv with locations of participant seating within the map ("Participant-Seating-Coordinates.csv"). Each line of the vectormap represents two endpoints of a line: x1,y1,x2,y2. The coordinates of participant seating are x,y positions and rotation about the vertical axis in radians. Anonymized videos captured using two static cameras placed in the environment. They are located in the living room and small room, respectively. Animations visualized from XDF files that show participant location, robot behaviors and additional characteristics like participant-robot line-of-sight and relative audio volume. Quotes associated with themes taken from focus group data. These quotes demonstrate and justify the results of the thematic analysis. Raw text from focus groups is not included for privacy concerns. Quantitative results from focus groups associated with factors influencing perceived safety. These results demonstrate the findings from deductive content analysis. The deductive codebook is also included. Results from pre-experiment and between-trial questionnaires Copies of both questionnaires and the semi-structured focus group protocol. Human Subjects This dataset contain de-identified information for 24 total subjects over 13 experiment sessions. The population for the study is the students, faculty and staff at the University of Texas at Austin. Of the 24 participants, 18 are students and 6 are staff at the university. Ages range from 19-48 and there are 10 males and 14 females who participated. Published data has been de-identified in coordination with the university Internal Review Board. All participants signed informed consent to participate in the study and for the distribution of this data. Access Restrictions Transcripts from focus groups are not published due to privacy concerns. Videos including participants are de-identified with overlays on videos. All other data is labeled only by participant ID, which is not associated with any identifying characteristics. Experiment Design Robots This study considers indoor encounters with two quadruped mobile robots. Namely, the Boston Dynamics Spot and Unitree Go1. These mobile robots are capable of everyday movement tasks like inspection, search or mapping which may be common tasks for autonomous agents in university communities. The study focus on perceived safety of bystanders under encounters with these relevant platforms. Control Conditions and Experiment Session Layout We control three variables in this study: Participant seating social (together in the living room) v. isolated (one in living room, other in small room) Robots Together v. Separate Robot Navigation v. Search Behavior A visual representation of the three control variables are shown on the left in (a)-(d) including the robot behaviors and participant seating locations, shown as X's. Blue represent social seating and yellow represent isolated seating. (a) shows the single robot navigation path. (b) is the two robot navigation paths. In (c) is the single robot search path and (d) shows the two robot search paths. The order of behaviors and seating locations are randomized and then inserted into the experiment session as overviewed in (e). These experiments are designed to gain insights into human responses to encounters with robots. The first step is receiving consent from the followed by a pre-experiment questionnaire that documents demographics, baseline stress information and big 5 personality traits. The nature video is repeated before and after the experimental session to establish a relaxed baseline physiological state. Experiments take place over 8 individual trials, which are defined by a subject seat arrangement, search or navigation behavior, and robots together or separate. After each of the 8 trials, participants take the between trial questionnaire, which is a 7 point Likert scale questionnaire designed to assess perceived safety during the preceding trial. After experiments and sensor removal, participants take part in a focus group. Synchronized Data Acquisition Data is synchronized from physiological sensors, environment microphones and the robots using the architecture shown. These raw xdf files are named using the following file naming convention: Trials where participants sit together in the living room [Session number]-[trial number]-social-[robots together or separate]-[search or navigation behavior].xdf Trials where participants are isolated [Session number]-[trial number]-isolated-[subject ID living room]-[subject ID small room]-[robots together or separate]-[search or navigation behavior].xdf Qualitative Data Qualitative data is obtained from focus groups with participants after experiments. Typically, two participants take part however two sessions only included one participant. The semi-structured focus group protocol can be found in the dataset. Two different research methods are applied to focus group transcripts. Note: the full transcripts are not provided for privacy concerns. First, we performed a qualitative content analysis using deductive codes found from an existing model of perceived safety during HRI (Akalin et al. 2023). The quantitative results from this analysis are reported as frequencies of references to the various factors of perceived safety. The codebook describing these factors is included in the dataset. Second, an inductive thematic analysis was performed on the data to identify emergent themes. The resulting themes and associated quotes taken from focus groups are also included. Data Organization Data is organized in separate folders, namely: animation-videos anonymized-session-videos focus-group-results questionnaire-responses research-materials signal-analysis-results synchronized-xdf-data Data Quality Statement In limited trials, participant EDA or ECG signals or robot pose information may be missing due to connectivity issues during data acquisition. Additionally, the questionnaires for Participant ID0 and ID1 are incomplete due to an error in the implementation of the Qualtrics survey instrument used.
more »
« less
- Award ID(s):
- 2219236
- PAR ID:
- 10548718
- Publisher / Repository:
- Texas Data Repository
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Human-robot interactions that involve multiple robots are becoming common. It is crucial to understand how multiple robots should transfer information and transition users between them. To investigate this, we designed a 3 x 3 mixed design study in which participants took part in a navigation task. Participants interacted with a stationary robot who summoned a functional (not explicitly social) mobile robot to guide them. Each participant experienced the three types of robot-robot interaction: representative (the stationary robot spoke to the participant on behalf of the mobile robot), direct (the stationary robot delivered the request to the mobile robot in a straightforward manner), and social (the stationary robot delivered the request to the mobile robot in a social manner). Each participant witnessed only one type of robot-robot communication: silent (the robots covertly communicated), explicit (the robots acknowledged that they were communicating), or reciting (the stationary robot said the request aloud). Our results show that it is possible to instill socialness in and improve likability of a functional robot by having a social robot interact socially with it. We also found that covertly exchanging information is less desirable than reciting information aloud.more » « less
-
This study evaluated how a robot demonstrating a Theory of Mind (ToM) influenced human perception of social intelligence and animacy in a human-robot interaction. Data was gathered through an online survey where participants watched a video depicting a NAO robot either failing or passing the Sally-Anne false-belief task. Participants (N = 60) were randomly assigned to either the Pass or Fail condition. A Perceived Social Intelligence Survey and the Perceived Intelligence and Animacy subsections of the Godspeed Questionnaire were used as measures. The Godspeed was given before viewing the task to measure participant expectations, and again after to test changes in opinion. Our findings show that robots demonstrating ToM significantly increase perceived social intelligence, while robots demonstrating ToM deficiencies are perceived as less socially intelligent.more » « less
-
In Human–Robot Interaction, researchers typically utilize in-person studies to collect subjective perceptions of a robot. In addition, videos of interactions and interactive simulations (where participants control an avatar that interacts with a robot in a virtual world) have been used to quickly collect human feedback at scale. How would human perceptions of robots compare between these methodologies? To investigate this question, we conducted a 2x2 between-subjects study (N=160), which evaluated the effect of the interaction environment (Real vs. Simulated environment) and participants’ interactivity during human-robot encounters (Interactive participation vs. Video observations) on perceptions about a robot (competence, discomfort, social presentation, and social information processing) for the task of navigating in concert with people. We also studied participants’ workload across the experimental conditions. Our results revealed a significant difference in the perceptions of the robot between the real environment and the simulated environment. Furthermore, our results showed differences in human perceptions when people watched a video of an encounter versus taking part in the encounter. Finally, we found that simulated interactions and videos of the simulated encounter resulted in a higher workload than real-world encounters and videos thereof. Our results suggest that findings from video and simulation methodologies may not always translate to real-world human–robot interactions. In order to allow practitioners to leverage learnings from this study and future researchers to expand our knowledge in this area, we provide guidelines for weighing the tradeoffs between different methodologies.more » « less
-
{"Abstract":["The PoseASL dataset consists of color and depth videos collected from ASL signers at the Linguistic and Assistive Technologies Laboratory under the direction of Matt Huenerfauth, as part of a collaborative research project with researchers at the Rochester Institute of Technology, Boston University, and the University of Pennsylvania.\n\nAccess: After becoming an authorized user of Databrary, please contact Matt Huenerfauth if you have difficulty accessing this volume. \n\nWe have collected a new dataset consisting of color and depth videos of fluent American Sign Language signers performing sequences ASL signs and sentences. Given interest among sign-recognition and other computer-vision researchers in red-green-blue-depth (RBGD) video, we release this dataset for use by the research community. In addition to the video files, we share depth data files from a Kinect v2 sensor, as well as additional motion-tracking files produced through post-processing of this data.\n\nOrganization of the Dataset: The dataset is organized into sub-folders, with codenames such as "P01" or "P16" etc. These codenames refer to specific human signers who were recorded in this dataset. Please note that there was no participant P11 nor P14; those numbers were accidentally skipped during the process of making appointments to collect video stimuli.\n\nTask: During the recording session, the participant was met by a member of our research team who was a native ASL signer. No other individuals were present during the data collection session. After signing the informed consent and video release document, participants responded to a demographic questionnaire. Next, the data-collection session consisted of English word stimuli and cartoon videos. The recording session began with showing participants stimuli consisting of slides that displayed English word and photos of items, and participants were asked to produce the sign for each (PDF included in materials subfolder). Next, participants viewed three videos of short animated cartoons, which they were asked to recount in ASL:\n- Canary Row, Warner Brothers Merrie Melodies 1950 (the 7-minute video divided into seven parts)\n- Mr. Koumal Flies Like a Bird, Studio Animovaneho Filmu 1969\n- Mr. Koumal Battles his Conscience, Studio Animovaneho Filmu 1971\nThe word list and cartoons were selected as they are identical to the stimuli used in the collection of the Nicaraguan Sign Language video corpora - see: Senghas, A. (1995). Children\u2019s Contribution to the Birth of Nicaraguan Sign Language. Doctoral dissertation, Department of Brain and Cognitive Sciences, MIT.\n\nDemographics: All 14 of our participants were fluent ASL signers. As screening, we asked our participants: Did you use ASL at home growing up, or did you attend a school as a very young child where you used ASL? All the participants responded affirmatively to this question. A total of 14 DHH participants were recruited on the Rochester Institute of Technology campus. Participants included 7 men and 7 women, aged 21 to 35 (median = 23.5). All of our participants reported that they began using ASL when they were 5 years old or younger, with 8 reporting ASL use since birth, and 3 others reporting ASL use since age 18 months. \n\nFiletypes:\n\n*.avi, *_dep.bin: The PoseASL dataset has been captured by using a Kinect 2.0 RGBD camera. The output of this camera system includes multiple channels which include RGB, depth, skeleton joints (25 joints for every video frame), and HD face (1,347 points). The video resolution produced in 1920 x 1080 pixels for the RGB channel and 512 x 424 pixels for the depth channels respectively. Due to limitations in the acceptable filetypes for sharing on Databrary, it was not permitted to share binary *_dep.bin files directly produced by the Kinect v2 camera system on the Databrary platform. If your research requires the original binary *_dep.bin files, then please contact Matt Huenerfauth.\n\n*_face.txt, *_HDface.txt, *_skl.txt: To make it easier for future researchers to make use of this dataset, we have also performed some post-processing of the Kinect data. To extract the skeleton coordinates of the RGB videos, we used the Openpose system, which is capable of detecting body, hand, facial, and foot keypoints of multiple people on single images in real time. The output of Openpose includes estimation of 70 keypoints for the face including eyes, eyebrows, nose, mouth and face contour. The software also estimates 21 keypoints for each of the hands (Simon et al, 2017), including 3 keypoints for each finger, as shown in Figure 2. Additionally, there are 25 keypoints estimated for the body pose (and feet) (Cao et al, 2017; Wei et al, 2016).\n\nReporting Bugs or Errors:\n\nPlease contact Matt Huenerfauth to report any bugs or errors that you identify in the corpus. We appreciate your help in improving the quality of the corpus over time by identifying any errors.\n\nAcknowledgement: This material is based upon work supported by the National Science Foundation under award 1749376: "Collaborative Research: Multimethod Investigation of Articulatory and Perceptual Constraints on Natural Language Evolution.""]}more » « less