skip to main content


Title: Learning To Find Good Correspondences Of Multiple Objects
Given a set of 3D to 2D putative matches, labeling the correspondences as inliers or outliers plays a critical role in a wide range of computer vision applications including the Perspective-n-Point (PnP) and object recognition. In this paper, we study a more generalized problem which allows the matches to belong to multiple objects with distinct poses. We propose a deep architecture to simultaneously label the correspondences as inliers or outliers and classify the inliers into multiple objects. Specifically, we discretize the 3D rotation space into twenty convex cones based on the facets of a regular icosahedron. For each facet, a facet classifier is trained to predict the probability of a correspondence being an inlier for a pose whose rotation normal vector points towards this facet. An efficient RANSAC-based post-processing algorithm is also proposed to further process the prediction results and detect the objects. Experiments demonstrate that our method is very efficient compared to existing methods and is capable of simultaneously labeling and classifying the inliers of multiple objects with high precision.  more » « less
Award ID(s):
1704204
NSF-PAR ID:
10253358
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2020 25th International Conference on Pattern Recognition (ICPR)
Page Range / eLocation ID:
2779 to 2786
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting that we investigate are: (i) guaranteeing correspondence and segmentation consistency across multiple input point clouds capturing different spatial arrangements of bodies or body parts; and (ii) obtaining robust motion-based rigid body segmentation applicable to novel object categories. We propose an approach to address these issues that incorporates spectral synchronization into an iterative deep declarative network, so as to simultaneously recover consistent correspondences as well as motion segmentation. At the same time, by explicitly disentangling the correspondence and motion segmentation estimation modules, we achieve strong generalizability across different object categories. Our extensive evaluations demonstrate that our method is effective on various datasets ranging from rigid parts in articulated objects to individually moving objects in a 3D scene, be it single-view or full point clouds. 
    more » « less
  2. The current study examined the neural correlates of spatial rotation in eight engineering undergraduates. Mastering engineering graphics requires students to mentally visualize in 3D and mentally rotate parts when developing 2D drawings. Students’ spatial rotation skills play a significant role in learning and mastering engineering graphics. Traditionally, the assessment of students’ spatial skills involves no measurements of neural activity during student performance of spatial rotation tasks. We used electroencephalography (EEG) to record neural activity while students performed the Revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). The two main objectives were to 1) determine whether high versus low performers on the Revised PSVT:R show differences in EEG oscillations and 2) identify EEG oscillatory frequency bands sensitive to item difficulty on the Revised PSVT:R.  Overall performance on the Revised PSVT:R determined whether participants were considered high or low performers: students scoring 90% or higher were considered high performers (5 students), whereas students scoring under 90% were considered low performers (3 students). Time-frequency analysis of the EEG data quantified power in several oscillatory frequency bands (alpha, beta, theta, gamma, delta) for comparison between low and high performers, as well as between difficulty levels of the spatial rotation problems.   Although we did not find any significant effects of performance type (high, low) on EEG power, we observed a trend in reduced absolute delta and gamma power for hard problems relative to easier problems. Decreases in delta power have been reported elsewhere for difficult relative to easy arithmetic calculations, and attributed to greater external attention (e.g., attention to the stimuli/numbers), and consequently, reduced internal attention (e.g., mentally performing the calculation). In the current task, a total of three spatial objects are presented. An example rotation stimulus is presented, showing a spatial object before and after rotation. A target stimulus, or spatial object before rotation is then displayed. Students must choose one of five stimuli (multiple choice options) that indicates the correct representation of the object after rotation. Reduced delta power in the current task implies that students showed greater attention to the example and target stimuli for the hard problem, relative to the moderate and easy problems. Therefore, preliminary findings suggest that students are less efficient at encoding the target stimuli (external attention) prior to mental rotation (internal attention) when task difficulty increases.  Our findings indicate that delta power may be used to identify spatial rotation items that are especially challenging for students. We may then determine the efficacy of spatial rotation interventions among engineering education students, using delta power as an index for increases in internal attention (e.g., increased delta power). Further, in future work, we will also use eye-tracking to assess whether our intervention decreases eye fixation (e.g., time spent viewing) toward the target stimulus on the Revised PSVT:R. By simultaneously using EEG and eye-tracking, we may identify changes in internal attention and encoding of the target stimuli that are predictive of improvements in spatial rotation skills among engineering education students.  
    more » « less
  3. We consider a category-level perception problem, where one is given 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where —for an object category— we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape estimation are formulated as a non-convex optimization. Our first contribution is to provide the first certifiably optimal solver for pose and shape estimation. In particular, we show that rotation estimation can be decoupled from the estimation of the object translation and shape, and we demonstrate that (i) the optimal object rotation can be computed via a tight (small-size) semidefinite relaxation, and (ii) the translation and shape parameters can be computed in closed-form given the rotation. Our second contribution is to add an outlier rejection layer to our solver, hence making it robust to a large number of misdetections. Towards this goal, we wrap our optimal solver in a robust estimation scheme based on graduated non-convexity. To further enhance robustness to outliers, we also develop the first graph-theoretic formulation to prune outliers in category-level perception, which removes outliers via convex hull and maximum clique computations; the resulting approach is robust to 70 − 90% outliers. Our third contribution is an extensive experimental evaluation. Besides providing an ablation study on a simulated dataset and on the PASCAL3D+ dataset, we combine our solver with a deep-learned keypoint detector, and show that the resulting approach improves over the state of the art in vehicle pose estimation in the ApolloScape datasets. 
    more » « less
  4. We consider the problem of clustering data sets in the presence of arbitrary outliers. Traditional clustering algorithms such as k-means and spectral clustering are known to perform poorly for data sets contaminated with even a small number of outliers. In this paper, we develop a provably robust spectral clustering algorithm that applies a simple rounding scheme to denoise a Gaussian kernel matrix built from the data points and uses vanilla spectral clustering to recover the cluster labels of data points. We analyze the performance of our algorithm under the assumption that the “good” data points are generated from a mixture of sub-Gaussians (we term these “inliers”), whereas the outlier points can come from any arbitrary probability distribution. For this general class of models, we show that the misclassification error decays at an exponential rate in the signal-to-noise ratio, provided the number of outliers is a small fraction of the inlier points. Surprisingly, this derived error bound matches with the best-known bound for semidefinite programs (SDPs) under the same setting without outliers. We conduct extensive experiments on a variety of simulated and real-world data sets to demonstrate that our algorithm is less sensitive to outliers compared with other state-of-the-art algorithms proposed in the literature. Funding: G. A. Hanasusanto was supported by the National Science Foundation Grants NSF ECCS-1752125 and NSF CCF-2153606. P. Sarkar gratefully acknowledges support from the National Science Foundation Grants NSF DMS-1713082, NSF HDR-1934932 and NSF 2019844. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2317 . 
    more » « less
  5. This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S2fM, which estimates the camera poses and scene geometry from a set of uncalibrated images by learning coordinate MLPs for the implicit surfaces and the radiance fields from the established key-point correspondences. Our novel formulation poses some new challenges due to inevitable two-view and few-view configurations in the incremental SfM pipeline, which complicates the optimization of coordinate MLPs for volumetric neural rendering with unknown camera poses. Nevertheless, we demonstrate that the strong inductive basis conveying in the 2D correspondences is promising to tackle those challenges by exploiting the relationship between the ray sampling schemes. Based on this, we revisit the pipeline of incremental SfM and renew the key components, including two-view geometry initialization, the camera poses registration, the 3D points triangulation, and Bundle Adjustment, with a fresh perspective based on neural implicit surfaces. By unifying the scene geometry in small MLP networks through coordinate MLPs, our Level-S2fM treats the zero-level set of the implicit surface as an informative top-down regularization to manage the reconstructed 3D points, reject the outliers in correspondences via querying SDF, and refine the estimated geometries by NBA (Neural BA). Not only does our Level-S2fM lead to promising results on camera pose estimation and scene geometry reconstruction, but it also shows a promising way for neural implicit rendering without knowing camera extrinsic beforehand. 
    more » « less