skip to main content

This content will become publicly available on July 1, 2023

Title: Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube
We build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. Human hands and robot hands differ in shape, size, and joint structure, and performing this translation from a single uncalibrated camera is a highly underconstrained problem. Moreover, the retargeted trajectories must effectively execute tasks on a physical robot, which requires them to be temporally smooth and free of self-collisions. Our key insight is that while paired human-robot correspondence data is expensive to collect, the internet contains a massive corpus of rich and diverse human hand videos. We leverage this data to train a system that understands human hands and retargets a human video stream into a robot hand-arm trajectory that is smooth, swift, safe, and semantically similar to the guiding demonstration. We demonstrate that it enables previously untrained people to teleoperate a robot on various dexterous manipulation tasks. Our low-cost, glove-free, marker-free remote teleoperation system makes robot teaching more accessible and we hope that it can aid robots that learn to act autonomously in the real world.
; ;
Award ID(s):
Publication Date:
Journal Name:
Robotics: Science and Systems
Sponsoring Org:
National Science Foundation
More Like this
  1. Motivated by the need to improve the quality of life for the elderly and disabled individuals who rely on wheelchairs for mobility, and who may have limited or no hand functionality at all, we propose an egocentric computer vision based co-robot wheelchair to enhance their mobility without hand usage. The robot is built using a commercially available powered wheelchair modified to be controlled by head motion. Head motion is measured by tracking an egocentric camera mounted on the user’s head and faces outward. Compared with previous approaches to hands-free mobility, our system provides a more natural human robot interface because it enables the user to control the speed and direction of motion in a continuous fashion, as opposed to providing a small number of discrete commands. This article presents three usability studies, which were conducted on 37 subjects. The first two usability studies focus on comparing the proposed control method with existing solutions while the third study was conducted to assess the effectiveness of training subjects to operate the wheelchair over several sessions. A limitation of our studies is that they have been conducted with healthy participants. Our findings, however, pave the way for further studies with subjects with disabilities.
  2. In modern industrial manufacturing processes, robotic manipulators are routinely used in the assembly, packaging, and material handling operations. During production, changing end-of-arm tooling is frequently necessary for process flexibility and reuse of robotic resources. In conventional operation, a tool changer is sometimes employed to load and unload end-effectors, however, the robot must be manually taught to locate the tool changers by operators via a teach pendant. During tool change teaching, the operator takes considerable effort and time to align the master and tool side of the coupler by adjusting the motion speed of the robotic arm and observing the alignment from different viewpoints. In this paper, a custom robotic system, the NeXus, was programmed to locate and change tools automatically via an RGB-D camera. The NeXus was configured as a multi-robot system for multiple tasks including assembly, bonding, and 3D printing of sensor arrays, solar cells, and microrobot prototypes. Thus, different tools are employed by an industrial robotic arm to position grippers, printers, and other types of end-effectors in the workspace. To improve the precision and cycle-time of the robotic tool change, we mounted an eye-in-hand RGB-D camera and employed visual servoing to automate the tool change process. We thenmore »compared the teaching time of the tool location using this system and compared the cycle time with those of 6 human operators in the manual mode. We concluded that the tool location time in automated mode, on average, more than two times lower than the expert human operators.« less
  3. The growing number of applications in Cyber-Physical Systems (CPS) involving different types of robots while maintaining interoperability and trust is an ongoing challenge faced by traditional centralized systems. This paper presents what is, to the best of our knowledge, the first integration of the Robotic Operating System (ROS) with the Ethereum blockchain using physical robots. We implement a specialized smart contract framework called “Swarm Contracts” that rely on blockchain technology in real-world applications for robotic agents with human interaction to perform collaborative tasks while ensuring trust by motivating the agents with incentives using a token economy with a self-governing structure. The use of open-source technologies, including robot hardware platforms such as TurtleBot3, Universal Robot arm, and ROS, enables the ability to connect a wide range of robot types to the framework we propose. Going beyond simulations, we demonstrate the robustness of the proposed system in real-world conditions with actual hardware robots.
  4. The objective of this research is to evaluate vision-based pose estimation methods for on-site construction robots. The prospect of human-robot collaborative work on construction sites introduces new workplace hazards that must be mitigated to ensure safety. Human workers working on tasks alongside construction robots must perceive the interaction to be safe to ensure team identification and trust. Detecting the robot pose in real-time is thus a key requirement in order to inform the workers and to enable autonomous operation. Vision-based (marker-less, marker-based) and sensor-based (IMU, UWB) are two of the main methods for estimating robot pose. The marker-based and sensor-based methods require some additional preinstalled sensors or markers, whereas the marker-less method only requires an on-site camera system, which is common on modern construction sites. In this research, we develop a marker-less pose estimation system, which is based on a convolutional neural network (CNN) human pose estimation algorithm: stacked hourglass networks. The system is trained with image data collected from a factory setup environment and labels of excavator pose. We use a KUKA robot arm with a bucket mounted on the end-effector to represent a robotic excavator in our experiment. We evaluate the marker-less method and compare the result withmore »the robot’s ground truth pose. The preliminary results show that the marker-less method is capable of estimating the pose of the excavator based on a state-of-the-art human pose estimation algorithm.« less
  5. Abstract
    The PoseASL dataset consists of color and depth videos collected from ASL signers at the Linguistic and Assistive Technologies Laboratory under the direction of Matt Huenerfauth, as part of a collaborative research project with researchers at the Rochester Institute of Technology, Boston University, and the University of Pennsylvania. Access: After becoming an authorized user of Databrary, please contact Matt Huenerfauth if you have difficulty accessing this volume. We have collected a new dataset consisting of color and depth videos of fluent American Sign Language signers performing sequences ASL signs and sentences. Given interest among sign-recognition and other computer-vision researchers in red-green-blue-depth (RBGD) video, we release this dataset for use by the research community. In addition to the video files, we share depth data files from a Kinect v2 sensor, as well as additional motion-tracking files produced through post-processing of this data. Organization of the Dataset: The dataset is organized into sub-folders, with codenames such as "P01" or "P16" etc. These codenames refer to specific human signers who were recorded in this dataset. Please note that there was no participant P11 nor P14; those numbers were accidentally skipped during the process of making appointments to collect video stimuli. Task: DuringMore>>