This paper presents a novel strategy to train keypoint detection models for robotics applications. Our goal is to develop methods that can robustly detect and track natural features on robotic manipulators. Such features can be used for vision-based control and pose estimation purposes, when placing artificial markers (e.g. ArUco) on the robot’s body is not possible or practical in runtime. Prior methods require accurate camera calibration and robot kinematic models in order to label training images for the keypoint locations. In this paper, we remove these dependencies by utilizing inpainting methods: In the training phase, we attach ArUco markers along the robot’s body and then label the keypoint locations as the center of those markers. We, then, use an inpainting method to reconstruct the parts of the robot occluded by the ArUco markers. As such, the markers are artificially removed from the training images, and labeled data is obtained to train markerless keypoint detection algorithms without the need for camera calibration or robot models. Using this approach, we trained a model for realtime keypoint detection and used the inferred keypoints as control features for an adaptive visual servoing scheme. We obtained successful control results with this fully model-free control strategy, utilizing natural robot features in the runtime and not requiring camera calibration or robot models in any stage of this process.
more »
« less
Keypoints-Based Adaptive Visual Servoing for Control of Robotic Manipulators in Configuration Space
This paper presents a visual servoing method for controlling a robot in the configuration space by purely using its natural features. We first created a data collection pipeline that uses camera intrinsics, extrinsics, and forward kinematics to generate 2D projections of a robot's joint locations (keypoints) in image space. Using this pipeline, we are able to collect large sets of real-robot data, which we use to train realtime keypoint detectors. The inferred keypoints from the trained model are used as control features in an adaptive visual servoing scheme that estimates, in runtime, the Jacobian relating the changes of the keypoints and joint velocities. We compared the 2D configuration control performance of this method to the skeleton-based visual servoing method (the only other algorithm for purely vision-based configuration space visual servoing), and demonstrated that the keypoints provide more robust and less noisy features, which result in better transient response. We also demonstrate the first vision-based 3D configuration space control results in the literature, and discuss its limitations. Our data collection pipeline is available at https://github.com/JaniC-WPI/KPDataGenerator.git which can be utilized to collect image datasets and train realtime keypoint detectors for various robots and environments.
more »
« less
- Award ID(s):
- 1900953
- PAR ID:
- 10481954
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- ISSN:
- 2153-0866
- ISBN:
- 978-1-6654-9190-7
- Page Range / eLocation ID:
- 6387 to 6394
- Subject(s) / Keyword(s):
- Keypoint tracking, vision-based control, visual servoing
- Format(s):
- Medium: X
- Location:
- Detroit, MI, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper presents a novel visual servoing method that controls a robotic manipulator in the configuration space as opposed to the classical vision-based control methods solely focusing on the end effector pose. We first extract the robot's shape from depth images using a skeletonization algorithm and represent it using parametric curves. We then adopt an adaptive visual servoing scheme that estimates the Jacobian online relating the changes of the curve parameters and the joint velocities. The proposed scheme does not only enable controlling a manipulator in the configuration space, but also demonstrates a better transient response while converging to the goal configuration compared to the classical adaptive visual servoing methods. We present simulations and real robot experiments that demonstrate the capabilities of the proposed method and analyze its performance, robustness, and repeatability compared to the classical algorithms.more » « less
-
This work presents a motion planning framework for robotic manipulators that computes collision-free paths directly in image space. The generated paths can then be tracked using vision-based control, eliminating the need for an explicit robot model or proprioceptive sensing. At the core of our approach is the construction of a roadmap entirely in image space. To achieve this, we explicitly define sampling, nearest-neighbor selection, and collision checking based on visual features rather than geometric models. We first collect a set of image space samples by moving the robot within its workspace, capturing keypoints along its body at different configurations. These samples serve as nodes in the roadmap, which we construct using either learned or predefined distance metrics. At runtime, the roadmap generates collision-free paths directly in image space, removing the need for a robot model or joint encoders. We validate our approach through an experimental study in which a robotic arm follows planned paths using an adaptive vision-based control scheme to avoid obstacles. The results show that paths generated with the learned-distance roadmap achieved 100% success in control convergence, whereas the predefined image space distance roadmap enabled faster transient responses but had a lower success rate in convergence.more » « less
-
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for capturing essential object features, and for establishing a reference frame in action prediction, enabling data-efficient learning of robot skills. However, their manual design nature and reliance on additional human labels limit their scalability. In this paper, we propose KALM, a framework that leverages large pre-trained vision-language models (LMs) to automatically generate taskrelevant and cross-instance consistent keypoints. KALM distills robust and consistent keypoints across views and objects by generating proposals using LMs and verifies them against a small set of robot demonstration data. Based on the generated keypoints, we can train keypoint-conditioned policy models that predict actions in keypoint-centric frames, enabling robots to generalize effectively across varying object poses, camera views, and object instances with similar functional shapes. Our method demonstrates strong performance in the real world, adapting to different tasks and environments from only a handful of demonstrations while requiring no additional labels.more » « less
-
We introduce Visual Inverse Kinematics (VIK), which finds kinematically feasible joint configurations that satisfy vision-based constraints, bridging the gap between inverse kinematics (IK) and visual servoing (VS). Unlike IK, no explicit end-effector pose is given, and unlike VS, exact image measurements may not be available. In this work, we develop a formulation of the VIK problem with a field of view (FoV) constraint, enforcing the visibility of an object from a camera on the robot. Our proposed solution introduces a virtual kinematic chain that connects the physical robot and the object, transforming the FoV constraint into a joint angle kinematic constraint. Along the way, we introduce multiple vision-based cost functions to fulfill different objectives. We solve this formulation of the VIK problem using a method that involves a semidefinite program (SDP) constraint followed by a rank minimization algorithm. The performance of this method for solving the VIK problem is validated through simulations.more » « less
An official website of the United States government
