skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HoloSet - A Dataset for Visual-Inertial Pose Estimation in Extended Reality: Dataset
There is a lack of datasets for visual-inertial odometry applications in Extended Reality (XR). To the best of our knowledge, there is no dataset available that is captured from an XR headset with a human as a carrier. To bridge this gap, we present a novel pose estimation dataset --- called HoloSet --- collected using Microsoft Hololens 2, which is a state-of-the-art head mounted device for XR. Potential applications for HoloSet include visual-inertial odometry, simultaneous localization and mapping (SLAM), and additional applications in XR that leverage visual-inertial data. HoloSet captures both macro and micro movements. For macro movements, the dataset consists of more than 66,000 samples of visual, inertial, and depth camera data in a variety of environments (indoor, outdoor) and scene setups (trails, suburbs, downtown) under multiple user action scenarios (walk, jog). For micro movements, the dataset consists of more than 12,000 samples of additional articulated hand depth camera images while a user plays games that exercise fine motor skills and hand-eye coordination. We present basic visualizations and high-level statistics of the data and outline the potential research use cases for HoloSet.  more » « less
Award ID(s):
2230143
PAR ID:
10465111
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM Conference on Embedded Networked Sensor Systems
Page Range / eLocation ID:
1014 to 1019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents the Brown Pedestrian Odometry Dataset (BPOD) for benchmarking visual odometry algo- rithms on data from head-mounted sensors. This dataset was captured with stereo and RGB streams from RealSense cameras with rolling and global shutters in 12 diverse in- door and outdoor locations on Brown University’s cam- pus. Its associated ground-truth trajectories were gener- ated from third-person videos that documented the recorded pedestrians’ positions relative to stick-on markers placed along their paths. We evaluate the performance of canoni- cal approaches representative of direct, feature-based, and learning-based visual odometry methods on BPOD. Our finding is that current methods which are successful on other benchmarks fail on BPOD. The failure modes cor- respond in part to rapid pedestrian rotation, erratic body movements, etc. We hope this dataset will play a significant role in the identification of these failure modes and in the design, development, and evaluation of pedestrian odome- try algorithms. 
    more » « less
  2. Distance estimation from vision is fundamental for a myriad of robotic applications such as navigation, manipulation,and planning. Inspired by the mammal’s visual system, which gazes at specific objects, we develop two novel constraints relating time-to-contact, acceleration, and distance that we call the τ -constraint and Φ-constraint. They allow an active (moving) camera to estimate depth efficiently and accurately while using only a small portion of the image. The constraints are applicable to range sensing, sensor fusion, and visual servoing. We successfully validate the proposed constraints with two experiments. The first applies both constraints in a trajectory estimation task with a monocular camera and an Inertial Measurement Unit (IMU). Our methods achieve 30-70% less average trajectory error while running 25× and 6.2× faster than the popular Visual-Inertial Odometry methods VINS-Mono and ROVIO respectively. The second experiment demonstrates that when the constraints are used for feedback with efference copies the resulting closed-loop system’s eigenvalues are invariant to scaling of the applied control signal. We believe these results indicate the τ and Φ constraint’s potential as the basis of robust and efficient algorithms for a multitude of robotic applications. 
    more » « less
  3. Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (IMU) data into the pose refinement process, which, compared to the state-of-the-art, greatly enhances the pose prediction. The improved accuracy and robustness make it possible for numerous vision algorithms to use imagery captured by rolling shutter cameras and produce highly accurate results. We also extend a dataset to have real rolling shutter images, IMU data, depth maps, camera poses, and corresponding global shutter images for rolling shutter correction training. We demonstrate the efficacy of the proposed method by evaluating the performance of Direct Sparse Odometry (DSO) algorithm on rolling shutter imagery corrected using the proposed approach. Results show marked improvements of the DSO algorithm over using uncorrected imagery, validating the proposed approach. 
    more » « less
  4. Visual odometry (VO) is a method used to estimate self-motion of a mobile robot using visual sensors. Unlike odometry based on integrating differential measurements that can accumulate errors, such as inertial sensors or wheel encoders, VO is not compromised by drift. However, image-based VO is computationally demanding, limiting its application in use cases with low-latency, low-memory and low-energy requirements. Neuromorphic hardware offers low-power solutions to many vision and artificial intelligence problems, but designing such solutions is complicated and often has to be assembled from scratch. Here we propose the use of vector symbolic architecture (VSA) as an abstraction layer to design algorithms compatible with neuromorphic hardware. Building from a VSA model for scene analysis, described in our companion paper, we present a modular neuromorphic algorithm that achieves state-of-the-art performance on two-dimensional VO tasks. Specifically, the proposed algorithm stores and updates a working memory of the presented visual environment. Based on this working memory, a resonator network estimates the changing location and orientation of the camera. We experimentally validate the neuromorphic VSA-based approach to VO with two benchmarks: one based on an event-camera dataset and the other in a dynamic scene with a robotic task. 
    more » « less
  5. Recovering rigid registration between successive camera poses lies at the heart of 3D reconstruction, SLAM and visual odometry. Registration relies on the ability to compute discriminative 2D features in successive camera images for determining feature correspondences, which is very challenging in feature-poor environments, i.e. low-texture and/or low-light environments. In this paper, we aim to address the challenge of recovering rigid registration between successive camera poses in feature-poor environments in a Visual Inertial Odometry (VIO) setting. In addition to inertial sensing, we instrument a small aerial robot with an RGBD camera and propose a framework that unifies the incorporation of 3D geometric entities: points, lines, and planes. The tracked 3D geometric entities provide constraints in an Extended Kalman Filtering framework. We show that by directly exploiting 3D geometric entities, we can achieve improved registration. We demonstrate our approach on different texture-poor environments, with some containing only flat texture-less surfaces providing essentially no 2D features for tracking. In addition, we evaluate how the addition of different 3D geometric entities contributes to improved pose estimation by comparing an estimated pose trajectory to a ground truth pose trajectory obtained from a motion capture system. We consider computationally efficient methods for detecting 3D points, lines and planes, since our goal is to implement our approach on small mobile robots, such as drones. 
    more » « less