skip to main content


Title: HARMONIC: A multimodal dataset of assistive human–robot collaboration

We present the Human And Robot Multimodal Observations of Natural Interactive Collaboration (HARMONIC) dataset. This is a large multimodal dataset of human interactions with a robotic arm in a shared autonomy setting designed to imitate assistive eating. The dataset provides human, robot, and environmental data views of 24 different people engaged in an assistive eating task with a 6-degree-of-freedom (6-DOF) robot arm. From each participant, we recorded video of both eyes, egocentric video from a head-mounted camera, joystick commands, electromyography from the forearm used to operate the joystick, third-person stereo video, and the joint positions of the 6-DOF robot arm. Also included are several features that come as a direct result of these recordings, such as eye gaze projected onto the egocentric video, body pose, hand pose, and facial keypoints. These data streams were collected specifically because they have been shown to be closely related to human mental states and intention. This dataset could be of interest to researchers studying intention prediction, human mental state modeling, and shared autonomy. Data streams are provided in a variety of formats such as video and human-readable CSV and YAML files.

 
more » « less
Award ID(s):
1943072
NSF-PAR ID:
10361751
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
The International Journal of Robotics Research
Volume:
41
Issue:
1
ISSN:
0278-3649
Page Range / eLocation ID:
p. 3-11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Shared autonomy provides an effective framework for human-robot collaboration that takes advantage of the complementary strengths of humans and robots to achieve common goals. Many existing approaches to shared autonomy make restrictive assumptions that the goal space, environment dynamics, or human policy are known a priori, or are limited to discrete action spaces, preventing those methods from scaling to complicated real world environments. We propose a model-free, residual policy learning algorithm for shared autonomy that alleviates the need for these assumptions. Our agents are trained to minimally adjust the human’s actions such that a set of goal-agnostic constraints are satisfied. We test our method in two continuous control environments: Lunar Lander, a 2D flight control domain, and a 6-DOF quadrotor reaching task. In experiments with human and surrogate pilots, our method significantly improves task performance without any knowledge of the human’s goal beyond the constraints. These results highlight the ability of model-free deep reinforcement learning to realize assistive agents suited to continuous control settings with little knowledge of user intent. 
    more » « less
  2. null (Ed.)
    This paper addresses the problem of autonomously deploying an unmanned aerial vehicle in non-trivial settings, by leveraging a manipulator arm mounted on a ground robot, acting as a versatile mobile launch platform. As real-world deployment scenarios for micro aerial vehicles such as searchand- rescue operations often entail exploration and navigation of challenging environments including uneven terrain, cluttered spaces, or even constrained openings and passageways, an often arising problem is that of ensuring a safe take-off location, or safely fitting through narrow openings while in flight. By facilitating launching from the manipulator end-effector, a 6- DoF controllable take-off pose within the arm workspace can be achieved, which allows to properly position and orient the aerial vehicle to initialize the autonomous flight portion of a mission. To accomplish this, we propose a sampling-based planner that respects a) the kinematic constraints of the ground robot / manipulator / aerial robot combination, b) the geometry of the environment as autonomously mapped by the ground robots perception systems, and c) accounts for the aerial robot expected dynamic motion during takeoff. The goal of the proposed planner is to ensure autonomous collision-free initialization of an aerial robotic exploration mission, even within a cluttered constrained environment. At the same time, the ground robot with the mounted manipulator can be used to appropriately position the take-off workspace into areas of interest, effectively acting as a carrier launch platform. We experimentally demonstrate this novel robotic capability through a sequence of experiments that encompass a micro aerial vehicle platform carried and launched from a 6-DoF manipulator arm mounted on a four-wheel robot base. 
    more » « less
  3. Observing how infants and mothers coordinate their behaviors can highlight meaningful patterns in early communication and infant development. While dyads often differ in the modalities they use to communicate, especially in the first year of life, it remains unclear how to capture coordination across multiple types of behaviors using existing computational models of interpersonal synchrony. This paper explores Dynamic Mode Decomposition with control (DMDc) as a method of integrating multiple signals from each communicating partner into a model of multimodal behavioral coordination. We used an existing video dataset to track the head pose, arm pose, and vocal fundamental frequency of infants and mothers during the Face-to-Face Still-Face (FFSF) procedure, a validated 3-stage interaction paradigm. For each recorded interaction, we fit both unimodal and multimodal DMDc models to the extracted pose data. The resulting dynamic characteristics of the models were analyzed to evaluate trends in individual behaviors and dyadic processes across infant age and stages of the interactions. Results demonstrate that observed trends in interaction dynamics across stages of the FFSF protocol were stronger and more significant when models incorporated both head and arm pose data, rather than a single behavior modality. Model output showed significant trends across age, identifying changes in infant movement and in the relationship between infant and mother behaviors. Models that included mothers’ audio data demonstrated similar results to those evaluated with pose data, confirming that DMDc can leverage different sets of behavioral signals from each interacting partner. Taken together, our results demonstrate the potential of DMDc toward integrating multiple behavioral signals into the measurement of multimodal interpersonal coordination. 
    more » « less
  4. null (Ed.)
    Interest in physical therapy and individual exercises such as yoga/dance has increased alongside the well-being trend, and people globally enjoy such exercises at home/office via video streaming platforms. However, such exercises are hard to follow without expert guidance. Even if experts can help, it is almost impossible to give personalized feedback to every trainee remotely. Thus, automated pose correction systems are required more than ever, and we introduce a new captioning dataset named FixMyPose to address this need. We collect natural language descriptions of correcting a “current” pose to look like a “target” pose. To support a multilingual setup, we collect descriptions in both English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to the environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures. Further, to avoid ML biases, we maintain a balance across characters with diverse demographics, who perform a variety of movements in several interior environments (e.g., homes, offices). From our FixMyPose dataset, we introduce two tasks: the pose-correctional-captioning task and its reverse, the target-pose-retrieval task. During the correctional-captioning task, models must generate the descriptions of how to move from the current to the target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and the correctional description. We present strong cross-attention baseline models (uni/multimodal, RL, multilingual) and also show that our baselines are competitive with other models when evaluated on other image-difference datasets. We also propose new task-specific metrics (object-match, body-part-match, direction-match) and conduct human evaluation for more reliable evaluation, and we demonstrate a large human-model performance gap suggesting room for promising future work. Finally, to verify the sim-to-real transfer of our FixMyPose dataset, we collect a set of real images and show promising performance on these images. Data and code are available: https://fixmypose-unc.github.io. 
    more » « less
  5. Research on robotic wheelchairs covers a broad range from complete autonomy to shared autonomy to manual navigation by a joystick or other means. Shared autonomy is valuable because it allows the user and the robot to complement each other, to correct each other's mistakes and to avoid collisions. In this paper, we present an approach that can learn to replicate path selection according to the wheelchair user's individual, often subjective, criteria in order to reduce the number of times the user has to intervene during automatic navigation. This is achieved by learning to rank paths using a support vector machine trained on selections made by the user in a simulator. If the classifier's confidence in the top ranked path is high, it is executed without requesting confirmation from the user. Otherwise, the choice is deferred to the user. Simulations and laboratory experiments using two path generation strategies demonstrate the effectiveness of our approach. 
    more » « less