skip to main content


Title: HARMONIC: A multimodal dataset of assistive human–robot collaboration

We present the Human And Robot Multimodal Observations of Natural Interactive Collaboration (HARMONIC) dataset. This is a large multimodal dataset of human interactions with a robotic arm in a shared autonomy setting designed to imitate assistive eating. The dataset provides human, robot, and environmental data views of 24 different people engaged in an assistive eating task with a 6-degree-of-freedom (6-DOF) robot arm. From each participant, we recorded video of both eyes, egocentric video from a head-mounted camera, joystick commands, electromyography from the forearm used to operate the joystick, third-person stereo video, and the joint positions of the 6-DOF robot arm. Also included are several features that come as a direct result of these recordings, such as eye gaze projected onto the egocentric video, body pose, hand pose, and facial keypoints. These data streams were collected specifically because they have been shown to be closely related to human mental states and intention. This dataset could be of interest to researchers studying intention prediction, human mental state modeling, and shared autonomy. Data streams are provided in a variety of formats such as video and human-readable CSV and YAML files.

 
more » « less
Award ID(s):
1943072
NSF-PAR ID:
10361751
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
The International Journal of Robotics Research
Volume:
41
Issue:
1
ISSN:
0278-3649
Page Range / eLocation ID:
p. 3-11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Shared autonomy provides an effective framework for human-robot collaboration that takes advantage of the complementary strengths of humans and robots to achieve common goals. Many existing approaches to shared autonomy make restrictive assumptions that the goal space, environment dynamics, or human policy are known a priori, or are limited to discrete action spaces, preventing those methods from scaling to complicated real world environments. We propose a model-free, residual policy learning algorithm for shared autonomy that alleviates the need for these assumptions. Our agents are trained to minimally adjust the human’s actions such that a set of goal-agnostic constraints are satisfied. We test our method in two continuous control environments: Lunar Lander, a 2D flight control domain, and a 6-DOF quadrotor reaching task. In experiments with human and surrogate pilots, our method significantly improves task performance without any knowledge of the human’s goal beyond the constraints. These results highlight the ability of model-free deep reinforcement learning to realize assistive agents suited to continuous control settings with little knowledge of user intent. 
    more » « less
  2. null (Ed.)
    Interest in physical therapy and individual exercises such as yoga/dance has increased alongside the well-being trend, and people globally enjoy such exercises at home/office via video streaming platforms. However, such exercises are hard to follow without expert guidance. Even if experts can help, it is almost impossible to give personalized feedback to every trainee remotely. Thus, automated pose correction systems are required more than ever, and we introduce a new captioning dataset named FixMyPose to address this need. We collect natural language descriptions of correcting a “current” pose to look like a “target” pose. To support a multilingual setup, we collect descriptions in both English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to the environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures. Further, to avoid ML biases, we maintain a balance across characters with diverse demographics, who perform a variety of movements in several interior environments (e.g., homes, offices). From our FixMyPose dataset, we introduce two tasks: the pose-correctional-captioning task and its reverse, the target-pose-retrieval task. During the correctional-captioning task, models must generate the descriptions of how to move from the current to the target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and the correctional description. We present strong cross-attention baseline models (uni/multimodal, RL, multilingual) and also show that our baselines are competitive with other models when evaluated on other image-difference datasets. We also propose new task-specific metrics (object-match, body-part-match, direction-match) and conduct human evaluation for more reliable evaluation, and we demonstrate a large human-model performance gap suggesting room for promising future work. Finally, to verify the sim-to-real transfer of our FixMyPose dataset, we collect a set of real images and show promising performance on these images. Data and code are available: https://fixmypose-unc.github.io. 
    more » « less
  3. Observing how infants and mothers coordinate their behaviors can highlight meaningful patterns in early communication and infant development. While dyads often differ in the modalities they use to communicate, especially in the first year of life, it remains unclear how to capture coordination across multiple types of behaviors using existing computational models of interpersonal synchrony. This paper explores Dynamic Mode Decomposition with control (DMDc) as a method of integrating multiple signals from each communicating partner into a model of multimodal behavioral coordination. We used an existing video dataset to track the head pose, arm pose, and vocal fundamental frequency of infants and mothers during the Face-to-Face Still-Face (FFSF) procedure, a validated 3-stage interaction paradigm. For each recorded interaction, we fit both unimodal and multimodal DMDc models to the extracted pose data. The resulting dynamic characteristics of the models were analyzed to evaluate trends in individual behaviors and dyadic processes across infant age and stages of the interactions. Results demonstrate that observed trends in interaction dynamics across stages of the FFSF protocol were stronger and more significant when models incorporated both head and arm pose data, rather than a single behavior modality. Model output showed significant trends across age, identifying changes in infant movement and in the relationship between infant and mother behaviors. Models that included mothers’ audio data demonstrated similar results to those evaluated with pose data, confirming that DMDc can leverage different sets of behavioral signals from each interacting partner. Taken together, our results demonstrate the potential of DMDc toward integrating multiple behavioral signals into the measurement of multimodal interpersonal coordination. 
    more » « less
  4. null (Ed.)
    This paper addresses the problem of autonomously deploying an unmanned aerial vehicle in non-trivial settings, by leveraging a manipulator arm mounted on a ground robot, acting as a versatile mobile launch platform. As real-world deployment scenarios for micro aerial vehicles such as searchand- rescue operations often entail exploration and navigation of challenging environments including uneven terrain, cluttered spaces, or even constrained openings and passageways, an often arising problem is that of ensuring a safe take-off location, or safely fitting through narrow openings while in flight. By facilitating launching from the manipulator end-effector, a 6- DoF controllable take-off pose within the arm workspace can be achieved, which allows to properly position and orient the aerial vehicle to initialize the autonomous flight portion of a mission. To accomplish this, we propose a sampling-based planner that respects a) the kinematic constraints of the ground robot / manipulator / aerial robot combination, b) the geometry of the environment as autonomously mapped by the ground robots perception systems, and c) accounts for the aerial robot expected dynamic motion during takeoff. The goal of the proposed planner is to ensure autonomous collision-free initialization of an aerial robotic exploration mission, even within a cluttered constrained environment. At the same time, the ground robot with the mounted manipulator can be used to appropriately position the take-off workspace into areas of interest, effectively acting as a carrier launch platform. We experimentally demonstrate this novel robotic capability through a sequence of experiments that encompass a micro aerial vehicle platform carried and launched from a 6-DoF manipulator arm mounted on a four-wheel robot base. 
    more » « less
  5. More than 1 billion people in the world are estimated to experience significant disability. These disabilities can impact people's ability to independently conduct activities of daily living, including ambulating, eating, dressing, taking care of personal hygiene, and more. Mobile and manipulator robots, which can move about human environments and physically interact with objects and people, have the potential to assist people with disabilities in activities of daily living. Although the vision of physically assistive robots has motivated research across subfields of robotics for decades, such robots have only recently become feasible in terms of capabilities, safety, and price. More and more research involves end-to-end robotic systems that interact with people with disabilities in real-world settings. In this article, we survey papers about physically assistive robots intended for people with disabilities from top conferences and journals in robotics, human–computer interactions, and accessible technology, to identify the general trends and research methodologies. We then dive into three specific research themes—interaction interfaces, levels of autonomy, and adaptation—and present frameworks for how these themes manifest across physically assistive robot research. We conclude with directions for future research. 
    more » « less