skip to main content


Title: Stacked Hourglass Networks for Markerless Pose Estimation of Articulated Construction Robots
The objective of this research is to evaluate vision-based pose estimation methods for on-site construction robots. The prospect of human-robot collaborative work on construction sites introduces new workplace hazards that must be mitigated to ensure safety. Human workers working on tasks alongside construction robots must perceive the interaction to be safe to ensure team identification and trust. Detecting the robot pose in real-time is thus a key requirement in order to inform the workers and to enable autonomous operation. Vision-based (marker-less, marker-based) and sensor-based (IMU, UWB) are two of the main methods for estimating robot pose. The marker-based and sensor-based methods require some additional preinstalled sensors or markers, whereas the marker-less method only requires an on-site camera system, which is common on modern construction sites. In this research, we develop a marker-less pose estimation system, which is based on a convolutional neural network (CNN) human pose estimation algorithm: stacked hourglass networks. The system is trained with image data collected from a factory setup environment and labels of excavator pose. We use a KUKA robot arm with a bucket mounted on the end-effector to represent a robotic excavator in our experiment. We evaluate the marker-less method and compare the result with the robot’s ground truth pose. The preliminary results show that the marker-less method is capable of estimating the pose of the excavator based on a state-of-the-art human pose estimation algorithm.  more » « less
Award ID(s):
1734266
NSF-PAR ID:
10072688
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
35th International Symposium on Automation and Robotics in Construction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Struck-by accidents are potential safety concerns on construction sites and require a robust machine pose estimation. The development of deep learning methods has enhanced the human pose estimation that can be adapted for articulated machines. These methods require abundant dataset for training, which is challenging and time-consuming to obtain on-site. This paper proposes a fast data collection approach to build the dataset for excavator pose estimation. It uses two industrial robot arms as the excavator and the camera monopod to collect different excavator pose data. The 3D annotation can be obtained from the robot's embedded encoders. The 2D pose is annotated manually. For evaluation, 2,500 pose images were collected and trained with the stacked hourglass network. The results showed that the dataset is suitable for the excavator pose estimation network training in a controlled environment, which leads to the potential of the dataset augmenting with real construction site images. 
    more » « less
  2. Construction robots have drawn increased attention as a potential means of improving construction safety and productivity. However, it is still challenging to ensure safe human-robot collaboration on dynamic and unstructured construction workspaces. On construction sites, multiple entities dynamically collaborate with each other and the situational context between them evolves continually. Construction robots must therefore be equipped to visually understand the scene’s contexts (i.e., semantic relations to surrounding entities), thereby safely collaborating with humans, as a human vision system does. Toward this end, this study builds a unique deep neural network architecture and develops a construction-specialized model by experimenting multiple fine-tuning scenarios. Also, this study evaluates its performance on real construction operations data in order to examine its potential toward real-world applications. The results showed the promising performance of the tuned model: the recall@5 on training and validation dataset reached 92% and 67%, respectively. The proposed method, which supports construction co-robots with the holistic scene understanding, is expected to contribute to promoting safer human-robot collaboration in construction. 
    more » « less
  3. Tensegrity robots, which are composed of compressive elements (rods) and flexible tensile elements (e.g., cables), have a variety of advantages, including flexibility, low weight, and resistance to mechanical impact. Nevertheless, the hybrid soft-rigid nature of these robots also complicates the ability to localize and track their state. This work aims to address what has been recognized as a grand challenge in this domain, i.e., the state estimation of tensegrity robots through a markerless, vision-based method, as well as novel, onboard sensors that can measure the length of the robot's cables. In particular, an iterative optimization process is proposed to track the 6-DoF pose of each rigid element of a tensegrity robot from an RGB-D video as well as endcap distance measurements from the cable sensors. To ensure that the pose estimates of rigid elements are physically feasible, i.e., they are not resulting in collisions between rods or with the environment, physical constraints are introduced during the optimization. Real-world experiments are performed with a 3-bar tensegrity robot, which performs locomotion gaits. Given ground truth data from a motion capture system, the proposed method achieves less than 1~cm translation error and 3 degrees rotation error, which significantly outperforms alternatives. At the same time, the approach can provide accurate pose estimation throughout the robot's motion, while motion capture often fails due to occlusions. 
    more » « less
  4. Robots working in human environments often encounter a wide range of articulated objects, such as tools, cabinets, and other jointed objects. Such articulated objects can take an infinite number of possible poses, as a point in a potentially high-dimensional continuous space. A robot must perceive this continuous pose to manipulate the object to a desired pose. This problem of perception and manipulation of articulated objects remains a challenge due to its high dimensionality and multimodal uncertainty. Here, we describe a factored approach to estimate the poses of articulated objects using an efficient approach to nonparametric belief propagation. We consider inputs as geometrical models with articulation constraints and observed RGBD (red, green, blue, and depth) sensor data. The described framework produces object-part pose beliefs iteratively. The problem is formulated as a pairwise Markov random field (MRF), where each hidden node (continuous pose variable) is an observed object-part’s pose and the edges denote the articulation constraints between the parts. We describe articulated pose estimation by a “pull” message passing algorithm for nonparametric belief propagation (PMPNBP) and evaluate its convergence properties over scenes with articulated objects. Robot experiments are provided to demonstrate the necessity of maintaining beliefs to perform goal-driven manipulation tasks. 
    more » « less
  5. Robots working in human environments often encounter a wide range of articulated objects, such as tools, cabinets, and other jointed objects. Such articulated objects can take an infinite number of possible poses, as a point in a potentially high-dimensional continuous space. A robot must perceive this continuous pose in order to manipulate the object to a desired pose. This problem of perception and manipulation of articulated objects remains a challenge due to its high dimensionality and multi-modal uncertainty. In this paper, we propose a factored approach to estimate the poses of articulated objects using an efficient non-parametric belief propagation algorithm. We consider inputs as geometrical models with articulation constraints, and observed 3D sensor data. The proposed framework produces object-part pose beliefs iteratively. The problem is formulated as a pairwise Markov Random Field (MRF) where each hidden node (continuous pose variable) models an observed object-part's pose and each edge denotes an articulation constraint between a pair of parts. We propose articulated pose estimation by a Pull Message Passing algorithm for Nonparametric Belief Propagation (PMPNBP) and evaluate its convergence properties over scenes with articulated objects. 
    more » « less