This paper presents a novel strategy to train keypoint detection models for robotics applications. Our goal is to develop methods that can robustly detect and track natural features on robotic manipulators. Such features can be used for vision-based control and pose estimation purposes, when placing artificial markers (e.g. ArUco) on the robot’s body is not possible or practical in runtime. Prior methods require accurate camera calibration and robot kinematic models in order to label training images for the keypoint locations. In this paper, we remove these dependencies by utilizing inpainting methods: In the training phase, we attach ArUco markers along the robot’s body and then label the keypoint locations as the center of those markers. We, then, use an inpainting method to reconstruct the parts of the robot occluded by the ArUco markers. As such, the markers are artificially removed from the training images, and labeled data is obtained to train markerless keypoint detection algorithms without the need for camera calibration or robot models. Using this approach, we trained a model for realtime keypoint detection and used the inferred keypoints as control features for an adaptive visual servoing scheme. We obtained successful control results with this fully model-free control strategy, utilizing natural robot features in the runtime and not requiring camera calibration or robot models in any stage of this process.
more »
« less
Keypoints-Based Adaptive Visual Servoing for Control of Robotic Manipulators in Configuration Space
This paper presents a visual servoing method for controlling a robot in the configuration space by purely using its natural features. We first created a data collection pipeline that uses camera intrinsics, extrinsics, and forward kinematics to generate 2D projections of a robot's joint locations (keypoints) in image space. Using this pipeline, we are able to collect large sets of real-robot data, which we use to train realtime keypoint detectors. The inferred keypoints from the trained model are used as control features in an adaptive visual servoing scheme that estimates, in runtime, the Jacobian relating the changes of the keypoints and joint velocities. We compared the 2D configuration control performance of this method to the skeleton-based visual servoing method (the only other algorithm for purely vision-based configuration space visual servoing), and demonstrated that the keypoints provide more robust and less noisy features, which result in better transient response. We also demonstrate the first vision-based 3D configuration space control results in the literature, and discuss its limitations. Our data collection pipeline is available at https://github.com/JaniC-WPI/KPDataGenerator.git which can be utilized to collect image datasets and train realtime keypoint detectors for various robots and environments.
more »
« less
- Award ID(s):
- 1900953
- PAR ID:
- 10481954
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- ISSN:
- 2153-0866
- ISBN:
- 978-1-6654-9190-7
- Page Range / eLocation ID:
- 6387 to 6394
- Subject(s) / Keyword(s):
- Keypoint tracking, vision-based control, visual servoing
- Format(s):
- Medium: X
- Location:
- Detroit, MI, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper presents a novel visual servoing method that controls a robotic manipulator in the configuration space as opposed to the classical vision-based control methods solely focusing on the end effector pose. We first extract the robot's shape from depth images using a skeletonization algorithm and represent it using parametric curves. We then adopt an adaptive visual servoing scheme that estimates the Jacobian online relating the changes of the curve parameters and the joint velocities. The proposed scheme does not only enable controlling a manipulator in the configuration space, but also demonstrates a better transient response while converging to the goal configuration compared to the classical adaptive visual servoing methods. We present simulations and real robot experiments that demonstrate the capabilities of the proposed method and analyze its performance, robustness, and repeatability compared to the classical algorithms.more » « less
-
Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.more » « less
-
Reconstructing 4D vehicular activity (3D space and time) from cameras is useful for autonomous vehicles, commuters and local authorities to plan for smarter and safer cities. Traffic is inherently repetitious over long periods, yet current deep learning-based 3D reconstruction methods have not considered such repetitions and have difficulty generalizing to new intersection-installed cameras. We present a novel approach exploiting longitudinal (long-term) repetitious motion as self-supervision to reconstruct 3D vehicular activity from a video captured by a single fixed camera. Starting from off-the-shelf 2D keypoint detections, our algorithm optimizes 3D vehicle shapes and poses, and then clusters their trajectories in 3D space. The 2D keypoints and trajectory clusters accumulated over long-term are later used to improve the 2D and 3D keypoints via self-supervision without any human annotation. Our method improves reconstruction accuracy over state of the art on scenes with a significant visual difference from the keypoint detector’s training data, and has many applications including velocity estimation, anomaly detection and vehicle counting. We demonstrate results on traffic videos captured at multiple city intersections, collected using our smartphones, YouTube, and other public datasets.more » « less
-
The advances in deep reinforcement learning re- cently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller to reach the desired goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment [1]. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches.more » « less