skip to main content

This content will become publicly available on January 1, 2023

Title: MACHINE LEARNING-BASED ROBOTIC OBJECT DETECTION AND GRASPING FOR COLLABORATIVE ASSEMBLY
An integral part of information-centric smart manufacturing is the adaptation of industrial robots to complement human workers in a collaborative manner. While advancement in sensing has enabled real-time monitoring of workspace, understanding the semantic information in the workspace, such as parts and tools, remains a challenge for seamless robot integration. The resulting lack of adaptivity to perform in a dynamic workspace have limited robots to tasks with pre-defined actions. In this paper, a machine learning-based robotic object detection and grasping method is developed to improve the adaptivity of robots. Specifically, object detection based on the concept of single-shot detection (SSD) and convolutional neural network (CNN) is investigated to recognize and localize objects in the workspace. Subsequently, the extracted information from object detection, such as the type, position, and orientation of the object, is fed into a multi-layer perceptron (MLP) to generate the desired joint angles of robotic arm for proper object grasping and handover to the human worker. Network training is guided by forward kinematics of the robotic arm in a self-supervised manner to mitigate issues such as singularity in computation. The effectiveness of the developed method is validated on an eDo robotic arm in a human-robot collaborative assembly case more » study. « less
Authors:
Editors:
Hideki Aoyama; Keiich Shirase
Award ID(s):
1830295
Publication Date:
NSF-PAR ID:
10353048
Journal Name:
Proc. 2022 International Symposium on Flexible Automation (ISFA)
Page Range or eLocation-ID:
180 - 187
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In this paper, an optimization-based dynamic modeling method is used for human-robot lifting motion prediction. The three-dimensional (3D) human arm model has 13 degrees of freedom (DOFs) and the 3D robotic arm (Sawyer robotic arm) has 10 DOFs. The human arm and robotic arm are built in Denavit-Hartenberg (DH) representation. In addition, the 3D box is modeled as a floating-base rigid body with 6 global DOFs. The interactions between human arm and box, and robot and box are modeled as a set of grasping forces which are treated as unknowns (design variables) in the optimization formulation. The inverse dynamic optimization is used to simulate the lifting motion where the summation of joint torque squares of human arm is minimized subjected to physical and task constraints. The design variables are control points of cubic B-splines of joint angle profiles of the human arm, robotic arm, and box, and the box grasping forces at each time point. A numerical example is simulated for huma-robot lifting with a 10 Kg box. The human and robotic arms’ joint angle, joint torque, and grasping force profiles are reported. These optimal outputs can be used as references to control the human-robot collaborative lifting task.

  2. The objective of this research is to evaluate vision-based pose estimation methods for on-site construction robots. The prospect of human-robot collaborative work on construction sites introduces new workplace hazards that must be mitigated to ensure safety. Human workers working on tasks alongside construction robots must perceive the interaction to be safe to ensure team identification and trust. Detecting the robot pose in real-time is thus a key requirement in order to inform the workers and to enable autonomous operation. Vision-based (marker-less, marker-based) and sensor-based (IMU, UWB) are two of the main methods for estimating robot pose. The marker-based and sensor-based methods require some additional preinstalled sensors or markers, whereas the marker-less method only requires an on-site camera system, which is common on modern construction sites. In this research, we develop a marker-less pose estimation system, which is based on a convolutional neural network (CNN) human pose estimation algorithm: stacked hourglass networks. The system is trained with image data collected from a factory setup environment and labels of excavator pose. We use a KUKA robot arm with a bucket mounted on the end-effector to represent a robotic excavator in our experiment. We evaluate the marker-less method and compare the result withmore »the robot’s ground truth pose. The preliminary results show that the marker-less method is capable of estimating the pose of the excavator based on a state-of-the-art human pose estimation algorithm.« less
  3. This paper presents the design of a wearable robotic forearm for close-range human-robot collaboration. The robot's function is to serve as a lightweight supernumerary third arm for shared workspace activities. We present a functional prototype resulting from an iterative design process including several user studies. An analysis of the robot's kinematics shows an increase in reachable workspace by 246 % compared to the natural human reach. The robot's degrees of freedom and range of motion support a variety of usage scenarios with the robot as a collaborative tool, including self-handovers, fetching objects while the human's hands are occupied, assisting human-human collaboration, and stabilizing an object. We analyze the bio-mechanical loads for these scenarios and find that the design is able to operate within human ergonomic wear limits. We then report on a pilot human-robot interaction study that indicates robot autonomy is more task-time efficient and preferred by users when compared to direct voice-control. These results suggest that the design presented here is a promising configuration for a lightweight wearable robotic augmentation device, and can serve as a basis for further research into human-wearable collaboration.
  4. Although general purpose robotic manipulators are becoming more capable at manipulating various objects, their ability to manipulate millimeter-scale objects are usually limited. On the other hand, ultrasonic levitation devices have been shown to levitate a large range of small objects, from polystyrene balls to living organisms. By controlling the acoustic force fields, ultrasonic levitation devices can compensate for robot manipulator positioning uncertainty and control the grasping force exerted on the target object. The material agnostic nature of acoustic levitation devices and their ability to dexterously manipulate millimeter-scale objects make them appealing as a grasping mode for general purpose robots. In this work, we present an ultrasonic, contact-less manipulation device that can be attached to or picked up by any general purpose robotic arm, enabling millimeter-scale manipulation with little to no modification to the robot itself. This device is capable of performing the very first phase-controlled picking action on acoustically reflective surfaces. With the manipulator placed around the target object, the manipulator can grasp objects smaller in size than the robot's positioning uncertainty, trap the object to resist air currents during robot movement, and dexterously hold a small and fragile object, like a flower bud. Due to the contact-less nature ofmore »the ultrasound-based gripper, a camera positioned to look into the cylinder can inspect the object without occlusion, facilitating accurate visual feature extraction.« less
  5. The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments. A robot’s ability to interpret and execute commands is fundamentally tied to its semantic world knowledge. Commonly, robots use exteroceptive sensors, such as cameras or LiDAR, to detect entities in the workspace and infer their visual properties and spatial relationships. However, semantic world properties are often visually imperceptible. We posit the use of non-exteroceptive modalities including physical proprioception, factual descriptions, and domain knowledge as mechanisms for inferring semantic properties of objects. We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute the instructed tasks in a manner robust to erroneous, noisy, or contradictory evidence. In addition, we provide a method that allows the robot to communicate knowledge dissonance back to the human as a means of correcting errors in the operator’s world model. Finally, we propose an efficient framework that anticipates possible linguistic interactions and infers the associated groundings for the current world state, thereby bootstrapping both language understanding and generation. We present experiments on manipulators for tasks thatmore »require inference over partially observed semantic properties, and evaluate our framework’s ability to exploit expressed information and knowledge bases to facilitate convergence, and generate statements to correct declared facts that were observed to be inconsistent with the robot’s estimate of object properties.« less