skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Monocular Vision-Based Localization and Pose Estimation with a Nudged Particle Filter and Ellipsoidal Confidence Tubes
This paper proposes a nudged particle filter for estimating the pose of a camera mounted on flying robots collecting a video sequence. The nudged particle filter leverages two image-to-pose and pose-to-image neural networks trained in an auto-encoder fashion with a dataset of pose-labeled images. Given an image, the retrieved camera pose using the image-to-pose network serves as a special particle to nudge the set of particles generated from the particle filter while the pose-to-image network serves to compute the likelihoods of each particle. We demonstrate that such a nudging scheme effectively mitigates low likelihood samplings during the particle propagation step. Ellipsoidal confidence tubes are constructed from the set of particles to provide a computationally efficient bound on localization error. When an ellipsoidal tube self-intersects, the probability volume of the intersection can be significantly shrunken using a novel Dempster–Shafer probability mass assignment algorithm. Starting from the intersection, a loop closure procedure is developed to move backward in time to shrink the volumes of the entire ellipsoidal tube. Experimental results using the Georgia Tech Miniature Autonomous Blimp platform are provided to demonstrate the feasibility and effectiveness of the proposed algorithms in providing localization and pose estimation based on monocular vision.  more » « less
Award ID(s):
1924978
PAR ID:
10392657
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Unmanned Systems
Volume:
11
Issue:
01
ISSN:
2301-3850
Page Range / eLocation ID:
109 to 120
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we tackle the problem of active camera localization, which controls the camera movements actively to achieve an accurate camera pose. The past solutions are mostly based on Markov Localization, which reduces the position-wise camera uncertainty for localization. These approaches localize the camera in the discrete pose space and are agnostic to the localization-driven scene property, which restricts the camera pose accuracy in the coarse scale. We propose to overcome these limitations via a novel active camera localization algorithm, composed of a passive and an active localization module. The former optimizes the camera pose in the continuous pose space by establishing point-wise camera-world correspondences. The latter explicitly models the scene and camera uncertainty components to plan the right path for accurate camera pose estimation. We validate our algorithm on the challenging localization scenarios from both synthetic and scanned real-world indoor scenes. Experimental results demonstrate that our algorithm outperforms both the state-of-the-art Markov Localization based approach and other compared approaches on the fine-scale camera pose accuracy 
    more » « less
  2. We present EgoRenderer, a system for rendering full-body neural avatars of a person captured by a wearable, egocentric fisheye camera that is mounted on a cap or a VR headset. Our system renders photorealistic novel views of the actor and her motion from arbitrary virtual camera locations. Rendering full-body avatars from such egocentric images come with unique challenges due to the top-down view and large distortions. We tackle these challenges by decomposing the rendering process into several steps, including texture synthesis, pose construction, and neural image translation. For texture synthesis, we propose Ego-DPNet, a neural network that infers dense correspondences between the input fisheye images and an underlying parametric body model, and to extract textures from egocentric inputs. In addition, to encode dynamic appearances, our approach also learns an implicit texture stack that captures detailed appearance variation across poses and viewpoints. For correct pose generation, we first estimate body pose from the egocentric view using a parametric model. We then synthesize an external free-viewpoint pose image by projecting the parametric model to the user-specified target viewpoint. We next combine the target pose image and the textures into a combined feature image, which is transformed into the output color image using a neural image translation network. Experimental evaluations show that EgoRenderer is capable of generating realistic free-viewpoint avatars of a person wearing an egocentric camera. Comparisons to several baselines demonstrate the advantages of our approach. 
    more » « less
  3. Recovering multi-person 3D poses and shapes with absolute scales from a single RGB image is a challenging task due to the inherent depth and scale ambiguity from a single view. Current works on 3D pose and shape estimation tend to mainly focus on the estimation of the 3D joint locations relative to the root joint , usually defined as the one closest to the shape centroid, in case of humans defined as the pelvis joint. In this paper, we build upon an existing multi-person 3D mesh predictor network, ROMP, to create Absolute-ROMP. By adding absolute root joint localization in the camera coordinate frame, we are able to estimate multi-person 3D poses and shapes with absolute scales from a single RGB image. Such a single-shot approach allows the system to better learn and reason about the inter-person depth relationship, thus improving multi-person 3D estimation. In addition to this end to end network, we also train a CNN and transformer hybrid network, called TransFocal, to predict the f ocal length of the image’s camera. Absolute-ROMP estimates the 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. We then use TransFocal to obtain focal length and get absolute depth information of all joints in the camera coordinate frame. We evaluate Absolute-ROMP on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets. We evaluate TransFocal on dataset created from the Pano360 dataset and both are applicable to in-the-wild images and videos, due to real time performance. 
    more » « less
  4. One of the grand challenges in computer vision is to recover 3D poses and shapes of multiple human bodies with absolute scales from a single RGB image. The challenge stems from the inherent depth and scale ambiguity from a single view. The state of the art on 3D human pose and shape estimation mainly focuses on estimating the 3D joint locations relative to the root joint, defined as the pelvis joint. In this paper, a novel approach called Absolute-ROMP is proposed, which builds upon a one-stage multi-person 3D mesh predictor network, ROMP, to estimate multi-person 3D poses and shapes, but with absolute scales from a single RGB image. To achieve this, we introduce absolute root joint localization in the camera coordinate frame, which enables the estimation of 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. Moreover, a CNN and transformer hybrid network, called TransFocal, is proposed to predict the focal length of the image’s camera. This enables Absolute-ROMP to obtain absolute depth information of all joints in the camera coordinate frame, further improving the accuracy of our proposed method. The Absolute-ROMP is evaluated on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets, and TransFocal is evaluated on a dataset created from the Pano360 dataset. Our proposed approach achieves state-of-the-art results on these tasks, outperforming existing methods or has competitive performance. Due to its real-time performance, our method is applicable to in-the-wild images and videos. 
    more » « less
  5. We studied the evaporation-induced formation of supraparticles from dispersions of elongated colloidal particles using experiments and computer simulations. Aqueous droplets containing a dispersion of ellipsoidal and spherical polystyrene particles were dried on superamphiphobic surfaces at different humidity values that led to varying evaporation rates. Supraparticles made from only ellipsoidal particles showed short-range lateral ordering at the supraparticle surface and random orientations in the interior regardless of the evaporation rate. Particle-based simulations corroborated the experimental observations in the evaporation-limited regime and showed an increase in the local nematic ordering as the diffusion-limited regime was reached. A thin shell of ellipsoids was observed at the surface when supraparticles were made from binary mixtures of ellipsoids and spheres. Image analysis revealed that the supraparticle porosity increased with an increasing aspect ratio of the ellipsoids. 
    more » « less