skip to main content

Title: Learning To Find Good Correspondences Of Multiple Objects
Given a set of 3D to 2D putative matches, labeling the correspondences as inliers or outliers plays a critical role in a wide range of computer vision applications including the Perspective-n-Point (PnP) and object recognition. In this paper, we study a more generalized problem which allows the matches to belong to multiple objects with distinct poses. We propose a deep architecture to simultaneously label the correspondences as inliers or outliers and classify the inliers into multiple objects. Specifically, we discretize the 3D rotation space into twenty convex cones based on the facets of a regular icosahedron. For each facet, a facet classifier is trained to predict the probability of a correspondence being an inlier for a pose whose rotation normal vector points towards this facet. An efficient RANSAC-based post-processing algorithm is also proposed to further process the prediction results and detect the objects. Experiments demonstrate that our method is very efficient compared to existing methods and is capable of simultaneously labeling and classifying the inliers of multiple objects with high precision.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
2020 25th International Conference on Pattern Recognition (ICPR)
Page Range or eLocation-ID:
2779 to 2786
Sponsoring Org:
National Science Foundation
More Like this
  1. We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting that we investigate are: (i) guaranteeing correspondence and segmentation consistency across multiple input point clouds capturing different spatial arrangements of bodies or body parts; and (ii) obtaining robust motion-based rigid body segmentation applicable to novel object categories. We propose an approach to address these issues that incorporates spectral synchronization into an iterative deep declarative network, so as to simultaneously recover consistent correspondences as well as motion segmentation. At the same time, by explicitly disentangling the correspondence and motion segmentation estimation modules, we achieve strong generalizability across different object categories. Our extensive evaluations demonstrate that our method is effective on various datasets ranging from rigid parts in articulated objects to individually moving objects in a 3D scene, be it single-view or full point clouds.
  2. The current study examined the neural correlates of spatial rotation in eight engineering undergraduates. Mastering engineering graphics requires students to mentally visualize in 3D and mentally rotate parts when developing 2D drawings. Students’ spatial rotation skills play a significant role in learning and mastering engineering graphics. Traditionally, the assessment of students’ spatial skills involves no measurements of neural activity during student performance of spatial rotation tasks. We used electroencephalography (EEG) to record neural activity while students performed the Revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). The two main objectives were to 1) determine whether high versus low performers on the Revised PSVT:R show differences in EEG oscillations and 2) identify EEG oscillatory frequency bands sensitive to item difficulty on the Revised PSVT:R.  Overall performance on the Revised PSVT:R determined whether participants were considered high or low performers: students scoring 90% or higher were considered high performers (5 students), whereas students scoring under 90% were considered low performers (3 students). Time-frequency analysis of the EEG data quantified power in several oscillatory frequency bands (alpha, beta, theta, gamma, delta) for comparison between low and high performers, as well as between difficulty levels of the spatial rotation problems.   Although wemore »did not find any significant effects of performance type (high, low) on EEG power, we observed a trend in reduced absolute delta and gamma power for hard problems relative to easier problems. Decreases in delta power have been reported elsewhere for difficult relative to easy arithmetic calculations, and attributed to greater external attention (e.g., attention to the stimuli/numbers), and consequently, reduced internal attention (e.g., mentally performing the calculation). In the current task, a total of three spatial objects are presented. An example rotation stimulus is presented, showing a spatial object before and after rotation. A target stimulus, or spatial object before rotation is then displayed. Students must choose one of five stimuli (multiple choice options) that indicates the correct representation of the object after rotation. Reduced delta power in the current task implies that students showed greater attention to the example and target stimuli for the hard problem, relative to the moderate and easy problems. Therefore, preliminary findings suggest that students are less efficient at encoding the target stimuli (external attention) prior to mental rotation (internal attention) when task difficulty increases.  Our findings indicate that delta power may be used to identify spatial rotation items that are especially challenging for students. We may then determine the efficacy of spatial rotation interventions among engineering education students, using delta power as an index for increases in internal attention (e.g., increased delta power). Further, in future work, we will also use eye-tracking to assess whether our intervention decreases eye fixation (e.g., time spent viewing) toward the target stimulus on the Revised PSVT:R. By simultaneously using EEG and eye-tracking, we may identify changes in internal attention and encoding of the target stimuli that are predictive of improvements in spatial rotation skills among engineering education students. « less
  3. We consider a category-level perception problem, where one is given 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where —for an object category— we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape estimation are formulated as a non-convex optimization. Our first contribution is to provide the first certifiably optimal solver for pose and shape estimation. In particular, we show that rotation estimation can be decoupled from the estimation of the object translation and shape, and we demonstrate that (i) the optimal object rotation can be computed via a tight (small-size) semidefinite relaxation, and (ii) the translation and shape parameters can be computed in closed-form given the rotation. Our second contribution is to add an outlier rejection layer to our solver, hence making it robust to a large number of misdetections. Towards this goal, we wrap our optimal solver in a robust estimation scheme based on graduated non-convexity. To further enhance robustness to outliers,more »we also develop the first graph-theoretic formulation to prune outliers in category-level perception, which removes outliers via convex hull and maximum clique computations; the resulting approach is robust to 70 − 90% outliers. Our third contribution is an extensive experimental evaluation. Besides providing an ablation study on a simulated dataset and on the PASCAL3D+ dataset, we combine our solver with a deep-learned keypoint detector, and show that the resulting approach improves over the state of the art in vehicle pose estimation in the ApolloScape datasets.« less
  4. We present a new weakly supervised learning-based method for generating novel category-specific 3D shapes from unoccluded image collections. Our method is weakly supervised and only requires silhouette annotations from unoccluded, category-specific objects. Our method does not require access to the object's 3D shape, multiple observations per object from different views, intra-image pixel correspondences, or any view annotations. Key to our method is a novel multi-projection generative adversarial network (MP-GAN) that trains a 3D shape generator to be consistent with multiple 2D projections of the 3D shapes, and without direct access to these 3D shapes. This is achieved through multiple discriminators that encode the distribution of 2D projections of the 3D shapes seen from a different views. Additionally, to determine the view information for each silhouette image, we also train a view prediction network on visualizations of 3D shapes synthesized by the generator. We iteratively alternate between training the generator and training the view prediction network. We validate our multi-projection GAN on both synthetic and real image datasets. Furthermore, we also show that multi-projection GANs can aid in learning other high-dimensional distributions from lower dimensional training datasets, such as material-class specific spatially varying reflectance properties from images.
  5. In many real-life image analysis applications, particularly in biomedical research domains, the objects of interest undergo multiple transformations that alter their visual properties while keeping the semantic content unchanged. Disentangling images into semantic content factors and transformations can provide significant benefits into many domain-specific image analysis tasks. To this end, we propose a generic unsupervised framework, Harmony, that simultaneously and explicitly disentangles semantic content from multiple parameterized transformations. Harmony leverages a simple cross-contrastive learning framework with multiple explicitly parameterized latent representations to disentangle content from transformations. To demonstrate the efficacy of Harmony, we apply it to disentangle image semantic content from several parameterized transformations (rotation, translation, scaling, and contrast). Harmony achieves significantly improved disentanglement over the baseline models on several image datasets of diverse domains. With such disentanglement, Harmony is demonstrated to incentivize bioimage analysis research by modeling structural heterogeneity of macromolecules from cryo-ET images and learning transformation-invariant representations of protein particles from single-particle cryo-EM images. Harmony also performs very well in disentangling content from 3D transformations and can perform coarse and fast alignment of 3D cryo-ET subtomograms. Therefore, Harmony is generalizable to many other imaging domains and can potentially be extended to domains beyond imaging as well.