- Award ID(s):
- 1925231
- PAR ID:
- 10190833
- Date Published:
- Journal Name:
- IEEE International Conference on Robotics and Automation
- ISSN:
- 1049-3492
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
This paper presents a novel visual servoing method that controls a robotic manipulator in the configuration space as opposed to the classical vision-based control methods solely focusing on the end effector pose. We first extract the robot's shape from depth images using a skeletonization algorithm and represent it using parametric curves. We then adopt an adaptive visual servoing scheme that estimates the Jacobian online relating the changes of the curve parameters and the joint velocities. The proposed scheme does not only enable controlling a manipulator in the configuration space, but also demonstrates a better transient response while converging to the goal configuration compared to the classical adaptive visual servoing methods. We present simulations and real robot experiments that demonstrate the capabilities of the proposed method and analyze its performance, robustness, and repeatability compared to the classical algorithms.more » « less
-
Visual terrain-relative navigation (VTRN) is a localization method based on registering a source image taken from a robotic vehicle against a georeferenced target image. With high-resolution imagery databases of Earth and other planets now available, VTRN offers accurate, drift-free navigation for air and space robots even in the absence of external positioning signals. Despite its potential for high accuracy, however, VTRN remains extremely fragile to common and predictable seasonal effects, such as lighting, vegetation changes, and snow cover. Engineered registration algorithms are mature and have provable geometric advantages but cannot accommodate the content changes caused by seasonal effects and have poor matching skill. Approaches based on deep learning can accommodate image content changes but produce opaque position estimates that either lack an interpretable uncertainty or require tedious human annotation. In this work, we address these issues with targeted use of deep learning within an image transform architecture, which converts seasonal imagery to a stable, invariant domain that can be used by conventional algorithms without modification. Our transform preserves the geometric structure and uncertainty estimates of legacy approaches and demonstrates superior performance under extreme seasonal changes while also being easy to train and highly generalizable. We show that classical registration methods perform exceptionally well for robotic visual navigation when stabilized with the proposed architecture and are able to consistently anticipate reliable imagery. Gross mismatches were nearly eliminated in challenging and realistic visual navigation tasks that also included topographic and perspective effects.
-
— In this paper, we present CaveSeg - the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated training data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g. caveline, arrows), obstacles (e.g. ground plain and overhead layers), scuba divers, and open areas for servoing. Through comprehensive benchmark analyses on cave systems in USA, Mexico, and Spain locations, we demonstrate that robust deep visual models can be developed based on CaveSeg for fast semantic scene parsing of underwater cave environments. In particular, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art performance. Finally, we explore the design choices and implications of semantic segmentation for visual servoing by AUVs inside underwater caves. The proposed model and benchmark dataset open up promising opportunities for future research in autonomous underwater cave exploration and mapping.more » « less
-
null (Ed.)Tool tip visualization is an essential component of multiple robotic surgical and interventional procedures. In this paper, we introduce a real-time photoacoustic visual servoing system that processes information directly from raw acoustic sensor data, without requiring image formation or segmentation in order to make robot path planning decisions to track and maintain visualization of tool tips. The performance of this novel deep learning-based visual servoing system is compared to that of a visual servoing system which relies on image formation followed by segmentation to make and execute robot path planning decisions. Experiments were conducted with a plastisol phantom, ex vivo tissue, and a needle as the interventional tool. Needle tip tracking performance with the deep learning-based approach outperformed that of the image-based segmentation approach by 67.7% and 55.3% in phantom and ex vivo tissue, respectively. In addition, the deep learning-based system operated within the frame-rate-limiting 10 Hz laser pulse repetition frequency rate, with mean execution times of 75.2 ms and 73.9 ms per acquisition frame with phantom and ex vivo tissue, respectively. These results highlight the benefits of our new approach to integrate deep learning with robotic systems for improved automation and visual servoing of tool tips.more » « less
-
In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated when-ever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, Room-Nav, Vi ewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.more » « less