Methods for state estimation that rely on visual information are challenging on legged robots due to rapid changes in the viewing angle of onboard cameras. In this work, we show that by leveraging structure in the way that the robot locomotes, the accuracy of visual-inertial SLAM in these challenging scenarios can be increased. We present a method that takes advantage of the underlying periodic predictability often present in the motion of legged robots to improve the performance of the feature tracking module within a visual-inertial SLAM system. Our method performs multi-session SLAM on a single robot, where each session is responsible for mapping during a distinct portion of the robot’s gait cycle. Our method produces lower absolute trajectory error than several state-of-the-art methods for visual-inertial SLAM in both a simulated environment and on data collected on a quadrupedal robot executing dynamic gaits. On real-world bounding gaits, our median trajectory error was less than 35% of the error of the next best estimate provided by state-of-the-art methods.
more »
« less
IV-SLAM: Introspective Vision for Simultaneous Localization and Mapping
Existing solutions to visual simultaneous localization and mapping (VSLAM) assume that errors in feature extraction and matching are independent and identically distributed (i.i.d), but this assumption is known to not be true – features extracted from low-contrast regions of images exhibit wider error distributions than features from sharp corners. Furthermore, V-SLAM algorithms are prone to catastrophic tracking failures when sensed images include challenging conditions such as specular reflections, lens flare, or shadows of dynamic objects. To address such failures, previous work has focused on building more robust visual frontends, to filter out challenging features. In this paper, we present introspective vision for SLAM (IV-SLAM), a fundamentally different approach for addressing these challenges. IV-SLAM explicitly models the noise process of reprojection errors from visual features to be context-dependent, and hence non-i.i.d. We introduce an autonomously supervised approach for IV-SLAM to collect training data to learn such a context-aware noise model. Using this learned noise model, IV-SLAM guides feature extraction to select more features from parts of the image that are likely to result in lower noise, and further incorporate the learned noise model into the joint maximum likelihood estimation, thus making it robust to the aforementioned types of errors. We present empirical results to demonstrate that IV-SLAM 1) is able to accurately predict sources of error in input images, 2) reduces tracking error compared to V-SLAM, and 3) increases the mean distance between tracking failures by more than 70% on challenging real robot data compared to V-SLAM.
more »
« less
- PAR ID:
- 10222640
- Date Published:
- Journal Name:
- Conference on Robot Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Visual simultaneous localization and mapping (SLAM) is an emerging technology that enables low-power devices with a single camera to perform robotic navigation. However, most visual SLAM algorithms are tuned for images produced through the image sensor processing (ISP) pipeline optimized for highly aesthetic photography. In this paper, we investigate the feasibility of varying sensor quantization on RAW images directly from the sensor to save energy for visual SLAM. In particular, we compare linear and logarithmic image quantization and show visual SLAM is robust to the latter. Further, we introduce a new gradient-based image quantization scheme that outperforms logarithmic quantization’s energy savings while preserving accuracy for feature-based visual SLAM algorithms. This work opens a new direction in energy-efficient image sensing for SLAM in the futuremore » « less
-
We present a method for 3D non-rigid motion tracking and structure reconstruction from 2D points and curve segments from a sequence of perspective images. The 3D locations of features in the first frame are known. The 3D affine motion model is used to describe the nonrigid motion. The results from synthetic and real data are presented. The applications include: lip tracking, MPEG4 face player, and burn scar assessment. The results show that: 1) curve segments are more robust under noise (observed from synthetic data with different Gaussian noise level); and 2) using both feature yields a significant performance gain in real data.more » « less
-
Recovering rigid registration between successive camera poses lies at the heart of 3D reconstruction, SLAM and visual odometry. Registration relies on the ability to compute discriminative 2D features in successive camera images for determining feature correspondences, which is very challenging in feature-poor environments, i.e. low-texture and/or low-light environments. In this paper, we aim to address the challenge of recovering rigid registration between successive camera poses in feature-poor environments in a Visual Inertial Odometry (VIO) setting. In addition to inertial sensing, we instrument a small aerial robot with an RGBD camera and propose a framework that unifies the incorporation of 3D geometric entities: points, lines, and planes. The tracked 3D geometric entities provide constraints in an Extended Kalman Filtering framework. We show that by directly exploiting 3D geometric entities, we can achieve improved registration. We demonstrate our approach on different texture-poor environments, with some containing only flat texture-less surfaces providing essentially no 2D features for tracking. In addition, we evaluate how the addition of different 3D geometric entities contributes to improved pose estimation by comparing an estimated pose trajectory to a ground truth pose trajectory obtained from a motion capture system. We consider computationally efficient methods for detecting 3D points, lines and planes, since our goal is to implement our approach on small mobile robots, such as drones.more » « less
-
Edge computing is increasingly proposed as a solution for reducing resource consumption of mobile devices running simultaneous localization and mapping (SLAM) algorithms, with most edge-assisted SLAM systems assuming the communication resources between the mobile device and the edge server to be unlimited, or relying on heuristics to choose the information to be transmitted to the edge. This paper presents AdaptSLAM, an edge-assisted visual (V) and visual-inertial (VI) SLAM system that adapts to the available communication and computation resources, based on a theoretically grounded method we developed to select the subset of keyframes (the representative frames) for constructing the best local and global maps in the mobile device and the edge server under resource constraints. We implemented AdaptSLAM to work with the state-of-the-art open-source V-and VI-SLAM ORB-SLAM3 framework, and demonstrated that, under constrained network bandwidth, AdaptSLAM reduces the tracking error by 62% compared to the best baseline method.more » « less