skip to main content

Title: Model Quality Aware RANSAC: A Robust Camera Motion Estimator
Robust estimation of camera motion under the presence of outlier noise is a fundamental problem in robotics and computer vision. Despite existing efforts that focus on detecting motion and scene degeneracies, the best existing approach that builds on Random Consensus Sampling (RANSAC) still has non-negligible failure rate. Since a single failure canlead to the failure of the entire visual simultaneous localization and mapping, it is important to further improve the robust estimation algorithm. We propose a new robust camera motion estimator (RCME) by incorporating two main changes: a model-sample consistency test at the model instantiation stepand an inlier set quality test that verifies model-inlier consistency using differential entropy. We have implemented our RCME algorithm and tested it under many public datasets. The results have shown a consistent reduction in failure rate when comparing to the RANSAC-based Gold Standard approach and two recent variations of RANSAC methods.
Authors:
Award ID(s):
1925037
Publication Date:
NSF-PAR ID:
10206853
Journal Name:
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, Oct. 25-29, 2020
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper reports on a method for robust selection of inter-map loop closures in multi-robot simultaneous localization and mapping (SLAM). Existing robust SLAM methods assume a good initialization or an “odometry backbone” to classify inlier and outlier loop closures. In the multi-robot case, these assumptions do not always hold. This paper presents an algorithm called Pairwise Consistency Maximization (PCM) that estimates the largest pairwise internally consistent set of measurements. Finding the largest pairwise internally consistent set can be transformed into an instance of the maximum clique problem from graph theory, and by leveraging the associated literature it can be solved in real time. This paper evaluates how well PCM approximates the combinatorial gold standard using simulated data. It also evaluates the performance of PCM on synthetic and real-world data sets in comparison with DCS, SCGP, and RANSAC, and shows that PCM significantly outperforms these methods.
  2. We present an unsupervised learning framework for simultaneously training single-view depth prediction and optical flow estimation models using unlabeled video sequences. Existing unsupervised methods often exploit brightness constancy and spatial smoothness priors to train depth or flow models. In this paper, we propose to leverage geometric consistency as additional supervisory signals. Our core idea is that for rigid regions we can use the predicted scene depth and camera motion to synthesize 2D optical flow by backprojecting the induced 3D scene flow. The discrepancy between the rigid flow (from depth prediction and camera motion) and the estimated flow (from optical flow model) allows us to impose a cross-task consistency loss. While all the networks are jointly optimized during training, they can be applied independently at test time. Extensive experiments demonstrate that our depth and flow models compare favorably with state-of-the-art unsupervised methods.
  3. Current deep neural network approaches for camera pose estimation rely on scene structure for 3D motion estimation, but this decreases the robustness and thereby makes cross-dataset generalization difficult. In contrast, classical approaches to structure from motion estimate 3D motion utilizing optical flow and then compute depth. Their accuracy, however, depends strongly on the quality of the optical flow. To avoid this issue, direct methods have been proposed, which separate 3D motion from depth estimation, but compute 3D motion using only image gradients in the form of normal flow. In this paper, we introduce a network NFlowNet, for normal flow estimation which is used to enforce robust and direct constraints. In particular, normal flow is used to estimate relative camera pose based on the cheirality (depth positivity) constraint. We achieve this by formulating the optimization problem as a differentiable cheirality layer, which allows for end-to-end learning of camera pose. We perform extensive qualitative and quantitative evaluation of the proposed DiffPoseNet’s sensitivity to noise and its generalization across datasets. We compare our approach to existing state-of-the-art methods on KITTI, TartanAir, and TUM-RGBD datasets.
  4. The spiked covariance model has gained increasing popularity in high-dimensional data analysis. A fundamental problem is determination of the number of spiked eigenvalues, K. For estimation of K, most attention has focused on the use of top eigenvalues of sample covariance matrix, and there is little investigation into proper ways of using bulk eigenvalues to estimate K. We propose a principled approach to incorporating bulk eigenvalues in the estimation of K. Our method imposes a working model on the residual covariance matrix, which is assumed to be a diagonal matrix whose entries are drawn from a gamma distribution. Under this model, the bulk eigenvalues are asymptotically close to the quantiles of a fixed parametric distribution. This motivates us to propose a two-step method: the first step uses bulk eigenvalues to estimate parameters of this distribution, and the second step leverages these parameters to assist the estimation of K. The resulting estimator Kˆ aggregates information in a large number of bulk eigenvalues. We show the consistency of Kˆ under a standard spiked covariance model. We also propose a confidence interval estimate for K. Our extensive simulation studies show that the proposed method is robust and outperforms the existing methods in amore »range of scenarios. We apply the proposed method to analysis of a lung cancer microarray dataset and the 1000 Genomes dataset.« less
  5. Disentangling the sources of visual motion in a dynamic scene during self-movement or ego motion is important for autonomous navigation and tracking. In the dynamic image segments of a video frame containing independently moving objects, optic flow relative to the next frame is the sum of the motion fields generated due to camera and object motion. The traditional ego-motion estimation methods assume the scene to be static, and the recent deep learning-based methods do not separate pixel velocities into object- and ego-motion components. We propose a learning-based approach to predict both ego-motion parameters and object-motion field (OMF) from image sequences using a convolutional autoencoder while being robust to variations due to the unconstrained scene depth. This is achieved by: 1) training with continuous ego-motion constraints that allow solving for ego-motion parameters independently of depth and 2) learning a sparsely activated overcomplete ego-motion field (EMF) basis set, which eliminates the irrelevant components in both static and dynamic segments for the task of ego-motion estimation. In order to learn the EMF basis set, we propose a new differentiable sparsity penalty function that approximates the number of nonzero activations in the bottleneck layer of the autoencoder and enforces sparsity more effectively than L1-more »and L2-norm-based penalties. Unlike the existing direct ego-motion estimation methods, the predicted global EMF can be used to extract OMF directly by comparing it against the optic flow. Compared with the state-of-the-art baselines, the proposed model performs favorably on pixelwise object- and ego-motion estimation tasks when evaluated on real and synthetic data sets of dynamic scenes.« less