Introduction As mobile robots proliferate in communities, designers must consider the impacts these systems have on the users, onlookers, and places they encounter. It becomes increasingly necessary to study situations where humans and robots coexist in common spaces, even if they are not directly interacting. This dataset presents a multidisciplinary approach to study human-robot encounters in an indoor apartment-like setting between participants and two mobile robots. Participants take questionnaires, wear sensors for physiological measures, and take part in a focus group after experiments finish. This dataset contains raw time series data from sensors and robots, and qualitative results from focus groups. The data can be used to analyze measures of human physiological response to varied encounter conditions, and to gain insights into human preferences and comfort during community encounters with mobile robots. Dataset Contents A dictionary of terms found in the dataset can be found in the "Data-Dictionary.pdf" Synchronized XDF files from every trial with raw data from electrodermal activity (EDA), electrocardiography (ECG), photoplethysmography (PPG) and seismocardiography (SCG). These synchronized files also contain robot pose data and microphone data. Results from analysis of two important features found from heart rate variability (HRV) and EDA. Specifically, HRV_CMSEn and nsEDRfreq is computed for each participant over each trial. These results also include Robot Confidence, which is a classification score representing the confidence that the 80 physiological features considered originate from a subject in a robot encounter. The higher the score, the higher the confidence A vectormap of the environment used during testing ("AHG_vectormap.txt") and a csv with locations of participant seating within the map ("Participant-Seating-Coordinates.csv"). Each line of the vectormap represents two endpoints of a line: x1,y1,x2,y2. The coordinates of participant seating are x,y positions and rotation about the vertical axis in radians. Anonymized videos captured using two static cameras placed in the environment. They are located in the living room and small room, respectively. Animations visualized from XDF files that show participant location, robot behaviors and additional characteristics like participant-robot line-of-sight and relative audio volume. Quotes associated with themes taken from focus group data. These quotes demonstrate and justify the results of the thematic analysis. Raw text from focus groups is not included for privacy concerns. Quantitative results from focus groups associated with factors influencing perceived safety. These results demonstrate the findings from deductive content analysis. The deductive codebook is also included. Results from pre-experiment and between-trial questionnaires Copies of both questionnaires and the semi-structured focus group protocol. Human Subjects This dataset contain de-identified information for 24 total subjects over 13 experiment sessions. The population for the study is the students, faculty and staff at the University of Texas at Austin. Of the 24 participants, 18 are students and 6 are staff at the university. Ages range from 19-48 and there are 10 males and 14 females who participated. Published data has been de-identified in coordination with the university Internal Review Board. All participants signed informed consent to participate in the study and for the distribution of this data. Access Restrictions Transcripts from focus groups are not published due to privacy concerns. Videos including participants are de-identified with overlays on videos. All other data is labeled only by participant ID, which is not associated with any identifying characteristics. Experiment Design Robots This study considers indoor encounters with two quadruped mobile robots. Namely, the Boston Dynamics Spot and Unitree Go1. These mobile robots are capable of everyday movement tasks like inspection, search or mapping which may be common tasks for autonomous agents in university communities. The study focus on perceived safety of bystanders under encounters with these relevant platforms. Control Conditions and Experiment Session Layout We control three variables in this study: Participant seating social (together in the living room) v. isolated (one in living room, other in small room) Robots Together v. Separate Robot Navigation v. Search Behavior A visual representation of the three control variables are shown on the left in (a)-(d) including the robot behaviors and participant seating locations, shown as X's. Blue represent social seating and yellow represent isolated seating. (a) shows the single robot navigation path. (b) is the two robot navigation paths. In (c) is the single robot search path and (d) shows the two robot search paths. The order of behaviors and seating locations are randomized and then inserted into the experiment session as overviewed in (e). These experiments are designed to gain insights into human responses to encounters with robots. The first step is receiving consent from the followed by a pre-experiment questionnaire that documents demographics, baseline stress information and big 5 personality traits. The nature video is repeated before and after the experimental session to establish a relaxed baseline physiological state. Experiments take place over 8 individual trials, which are defined by a subject seat arrangement, search or navigation behavior, and robots together or separate. After each of the 8 trials, participants take the between trial questionnaire, which is a 7 point Likert scale questionnaire designed to assess perceived safety during the preceding trial. After experiments and sensor removal, participants take part in a focus group. Synchronized Data Acquisition Data is synchronized from physiological sensors, environment microphones and the robots using the architecture shown. These raw xdf files are named using the following file naming convention: Trials where participants sit together in the living room [Session number]-[trial number]-social-[robots together or separate]-[search or navigation behavior].xdf Trials where participants are isolated [Session number]-[trial number]-isolated-[subject ID living room]-[subject ID small room]-[robots together or separate]-[search or navigation behavior].xdf Qualitative Data Qualitative data is obtained from focus groups with participants after experiments. Typically, two participants take part however two sessions only included one participant. The semi-structured focus group protocol can be found in the dataset. Two different research methods are applied to focus group transcripts. Note: the full transcripts are not provided for privacy concerns. First, we performed a qualitative content analysis using deductive codes found from an existing model of perceived safety during HRI (Akalin et al. 2023). The quantitative results from this analysis are reported as frequencies of references to the various factors of perceived safety. The codebook describing these factors is included in the dataset. Second, an inductive thematic analysis was performed on the data to identify emergent themes. The resulting themes and associated quotes taken from focus groups are also included. Data Organization Data is organized in separate folders, namely: animation-videos anonymized-session-videos focus-group-results questionnaire-responses research-materials signal-analysis-results synchronized-xdf-data Data Quality Statement In limited trials, participant EDA or ECG signals or robot pose information may be missing due to connectivity issues during data acquisition. Additionally, the questionnaires for Participant ID0 and ID1 are incomplete due to an error in the implementation of the Qualtrics survey instrument used.
more »
« less
Avoiding the Ordering Trap in Systems Performance Measurement
It is common for performance studies of computer systems to make the assumption—either explicitly or implicitly—that results from each trial are independent. One place this assumption manifests is in experiment design, specifically in the order in which trials are run: if trials do not affect each other, the order in which they are run is unimportant. If, however, the execution of one trial does affect system state in ways that alter the results of future trials, this assumption does not hold, and ordering must be taken into account in experiment design. In the simplest example, if all trials with system setting A are run before all trials with setting B, this can systematically bias experiment results leading to the incorrect conclusion that “A is better than B” or vice versa. In this paper, we: (a) explore, via a literature and artifact survey, whether experiment ordering is taken in to consideration at top computer systems conferences; (b) devise a methodology for studying the effects of ordering on performance experiments, including statistical tests for order dependence; and (c) conduct the largest-scale empirical study to date on experiment ordering, using a dataset we collected over 9 months comprising nearly 2.3M measurements from over 1,700 servers. Our analysis shows that ordering effects are a hidden but dangerous trap that published performance experiments are not typically designed to avoid. We describe OrderSage, a tool that we have built to help detect and mitigate these effects, and use it on a number of case studies, including finding previously unknown ordering effects in an artifact from a published paper.
more »
« less
- Award ID(s):
- 2027208
- PAR ID:
- 10436963
- Date Published:
- Journal Name:
- Proceedings of the USENIX Annual Technical Conference (ATC) 2023
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Introduction Java Multi-Version Execution (JMVX) is a tool for performing Multi-Version Execution (MVX) and Record Replay (RR) in Java. Most tools for MVX and RR observe the behavior of a program at a low level, e.g., by looking at system calls. Unfortunately, this approach fails for high level language virtual machines due to benign divergences (differences in behavior that accomplish that same result) introduced by the virtual machine -- particularly by garbage collection and just-in-time compilation. In other words, the management of the virtual machines creates differing sequences of system calls that lead existing tools to believe a program has diverged, when in practice, the application running on top of the VM has not. JMVX takes a different approach, opting instead to add MVX and RR logic into the bytecode of compiled programs running in the VM to avoid benign divergences related to VM management. This artifact is a docker image that will create a container holding our source code, compiled system, and experiments with JMVX. The image allows you to run the experiments we used to address the research questions from the paper (from Section 4). This artifact is desiged to show: [Supported] JMVX performs MVX for Java [Supported] JMVX performs RR for Java [Supported] JMVX is performant In the "Step by Step" section, we will point out how to run experiments to generate data supporting these claims. The 3rd claim is supported, however, it may not be easily reproducible. For the paper we measured performance on bare metal rather than in a docker container. When testing the containerized artifact on a Macbook (Sonoma v14.5), JMVX ran slower than expected. Similarly, see the section on "Differences From Experiment" to see properties of the artifact that were altered (and could affect runtime results). Thanks for taking the time to explore our artifact. Hardware Requirements x86 machine running Linux, preferably Ubuntu 22.04 (Jammy) 120 Gb of storage About 10 Gb of RAM to spare 2+ cores Getting Started Guide Section is broken into 2 parts, setting up the docker container and running a quick experiment to test if everything is working. Container Setup Download the container image (DOI 10.5281/zenodo.12637140). If using docker desktop, increase the size of the virtual disk to 120 gb. In the GUI goto Settings > Resources > Virtual Disk (should be a slider). From the terminal, modify `diskSizeMiB` field in docker's `settings.json` and restart docker. Linux location: ~/.docker/desktop/settings.json. Mac location : ~/Library/Group Containers/group.com.docker/settings.json. Install with docker load -i java-mvx-image.tar.gz This process takes can take 30 minutes to 1 hour. Start the container via: docker run --name jmvx -it --shm-size="10g" java-mvx The `--shm-size` parameter is important as JMVX will crash the JVM if not enough space is available (detected via a SIGBUS error). Quick Start The container starts you off in an environment with JMVX already prepared, e.g., JMVX has been built and the instrumentation is done. The script test-quick.sh will test all of JMVX's features for DaCapo's avrora benchmark. The script has comments explaining each command. It should take about 10 minutes to run. The script starts by running our system call tracer tool. This phase of the script will create the directory /java-mvx/artifact/trace, which will contain: natives-avrora.log -- (serialized) map of methods, that resulted in system calls, to the stack trace that generated the call. /java-mvx/artifact/scripts/tracer/analyze2.sh is used to analyze this log and generate other files in this directory. table.txt - a table showing how many unique stack traces led to the invocation of a native method that called a system call. recommended.txt - A list of methods JMVX recommends to instrument for the benchmark. dump.txt - A textual dump of the last 8 methods from every stack trace logged. This allows us to reduce the number of methods we need to instrument by choosing a wrapper that can handle multiple system calls. `FileSystemProvider.checkAccess` is an example of this. JMVX will recommend functions to instrument, these are included in recommended.txt. If you inspect the file, you'll see some simple candidates for instrumentation, e.g., available, open, and read, from FileInputStream. The instrumentation code for FileInputInputStream can be found in /java-mvx/src/main/java/edu/uic/cs/jmvx/bytecode/FileInputStreamClassVisitor.java. The recommendations work in many cases, but for some, e.g. FileDescriptor.closeAll, we chose a different method (e.g., FileInputStream.close) by manually inspecting dump.txt. After tracing, runtime data is gathered, starting with measuring the overhead caused by instrumentation. Next it will move onto getting data on MVX, and finally RR. The raw output of the benchmark runs for these phases is saved in /java-mvx/artifact/data/quick. Tables showing the benchmark's runtime performance will be placed in /java-mvx/artifact/tables/quick. That directory will contain: instr.txt -- Measures the overhead of instrumentation. mvx.txt -- Performance for multi-version execution mode. rec.txt -- Performance for recording. rep.txt -- Performance for replaying. This script captures data for research claims 1-3 albeit for a single benchmark and with a single iteration. Note, data is captured for the benchmark's memory usage, but the txt tables only display runtime data. For more, see readme.pdf or readme.md.more » « less
-
Abstract Humans often experience striking performance deficits when their outcomes are determined by their own performance, colloquially referred to as “choking under pressure.” Physiological stress responses that have been linked to both choking and thriving are well-conserved in primates, but it is unknown whether other primates experience similar effects of pressure. Understanding whether this occurs and, if so, its physiological correlates, will help clarify the evolution and proximate causes of choking in humans. To address this, we trained capuchin monkeys on a computer game that had clearly denoted high- and low-pressure trials, then tested them on trials with the same signals of high pressure, but no difference in task difficulty. Monkeys significantly varied in whether they performed worse or better on high-pressure testing trials and performance improved as monkeys gained experience with performing under pressure. Baseline levels of cortisol were significantly negatively related to performance on high-pressure trials as compared to low-pressure trials. Taken together, this indicates that less experience with pressure may interact with long-term stress to produce choking behavior in early sessions of a task. Our results suggest that performance deficits (or improvements) under pressure are not solely due to human specific factors but are rooted in evolutionarily conserved biological factors.more » « less
-
Optimal treatment regimes (OTRs) have been widely employed in computer science and personalized medicine to provide data-driven, optimal recommendations to individuals. However, previous research on OTRs has primarily focused on settings that are independent and identically distributed, with little attention given to the unique characteristics of educational settings, where students are nested within schools and there are hierarchical dependencies. The goal of this study is to propose a framework for designing OTRs from multisite randomized trials, a commonly used experimental design in education and psychology to evaluate educational programs. We investigate modifications to popular OTR methods, specifically Q-learning and weighting methods, in order to improve their performance in multisite randomized trials. A total of 12 modifications, 6 for Q-learning and 6 for weighting, are proposed by utilizing different multilevel models, moderators, and augmentations. Simulation studies reveal that all Q-learning modifications improve performance in multisite randomized trials and the modifications that incorporate random treatment effects show the most promise in handling cluster-level moderators. Among weighting methods, the modification that incorporates cluster dummies into moderator variables and augmentation terms performs best across simulation conditions. The proposed modifications are demonstrated through an application to estimate an OTR of conditional cash transfer programs using a multisite randomized trial in Colombia to maximize educational attainment.more » « less
-
Abstract Humans and other animals are capable of reasoning. However, there are overwhelming examples of errors or anomalies in reasoning. In two experiments, we studied if rats, like humans, estimate the conjunction of two events as more likely than each event independently, a phenomenon that has been called conjunction fallacy. In both experiments, rats learned through food reinforcement to press a lever under some cue conditions but not others. Sound B was rewarded whereas Sound A was not. However, when B was presented with the visual cue Y was not rewarded, whereas AX was rewarded (i.e., A-, AX+, B+, BY-). Both visual cues were presented in the same bulb. After training, rats received test sessions in which A and B were presented with the bulb explicitly off or occluded by a metal piece. Thus, on the occluded condition, it was ambiguous whether the trials were of the elements alone (A or B) or of the compounds (AX or BY). Rats responded on the occluded condition as if the compound cues were most likely present. The second experiment investigated if this error in probability estimation in Experiment 1, could be due to a conjunction fallacy, and if this could be attenuated by increasing the ratio of element/compound trials from the original 50-50 to 70-30 and 90-10. Only the 90-10 condition (where 90% of the training trials were of just A or just B) did not show a conjunction fallacy, though it emerged in all groups with additional training. These findings open new avenues for exploring the mechanisms behind the conjunction fallacy effect.more » « less
An official website of the United States government

