skip to main content


Title: ARMSAINTS: An AR-based Real-time Mobile System for Assistive Indoor Navigation with Target Segmentation
This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes.  more » « less
Award ID(s):
1827505 1737533 2131186
NSF-PAR ID:
10346700
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Vision-based localization approaches now underpin newly emerging navigation pipelines for myriad use cases, from robotics to assistive technologies. Compared to sensor-based solutions, vision-based localization does not require pre-installed sensor infrastructure, which is costly, time-consuming, and/or often infeasible at scale. Herein, we propose a novel vision-based localization pipeline for a specific use case: navigation support for end users with blindness and low vision. Given a query image taken by an end user on a mobile application, the pipeline leverages a visual place recognition (VPR) algorithm to find similar images in a reference image database of the target space. The geolocations of these similar images are utilized in a downstream task that employs a weighted-average method to estimate the end user’s location. Another downstream task utilizes the perspective-n-point (PnP) algorithm to estimate the end user’s direction by exploiting the 2D–3D point correspondences between the query image and the 3D environment, as extracted from matched images in the database. Additionally, this system implements Dijkstra’s algorithm to calculate a shortest path based on a navigable map that includes the trip origin and destination. The topometric map used for localization and navigation is built using a customized graphical user interface that projects a 3D reconstructed sparse map, built from a sequence of images, to the corresponding a priori 2D floor plan. Sequential images used for map construction can be collected in a pre-mapping step or scavenged through public databases/citizen science. The end-to-end system can be installed on any internet-accessible device with a camera that hosts a custom mobile application. For evaluation purposes, mapping and localization were tested in a complex hospital environment. The evaluation results demonstrate that our system can achieve localization with an average error of less than 1 m without knowledge of the camera’s intrinsic parameters, such as focal length. 
    more » « less
  2. Agaian, Sos S. ; DelMarco, Stephen P. ; Asari, Vijayan K. (Ed.)
    High accuracy localization and user positioning tracking is critical in improving the quality of augmented reality environments. The biggest challenge facing developers is localizing the user based on visible surroundings. Current solutions rely on the Global Positioning System (GPS) for tracking and orientation. However, GPS receivers have an accuracy of about 10 to 30 meters, which is not accurate enough for augmented reality, which needs precision measured in millimeters or smaller. This paper describes the development and demonstration of a head-worn augmented reality (AR) based vision-aid indoor navigation system, which localizes the user without relying on a GPS signal. Commercially available augmented reality head-set allows individuals to capture the field of vision using the front-facing camera in a real-time manner. Utilizing captured image features as navigation-related landmarks allow localizing the user in the absence of a GPS signal. The proposed method involves three steps: a detailed front-scene camera data is collected and generated for landmark recognition; detecting and locating an individual’s current position using feature matching, and display arrows to indicate areas that require more data collects if needed. Computer simulations indicate that the proposed augmented reality-based vision-aid indoor navigation system can provide precise simultaneous localization and mapping in a GPS-denied environment. Keywords: Augmented-reality, navigation, GPS, HoloLens, vision, positioning system, localization 
    more » « less
  3. Smart health applications have received significant attention in recent years. Novel applications hold significant promise to overcome many of the inconveniences faced by persons with disabilities throughout daily living. For people with blindness and low vision (BLV), environmental perception is compromised, creating myriad difficulties. Precise localization is still a gap in the field and is critical to safe navigation. Conventional GNSS positioning cannot provide satisfactory performance in urban canyons. 3D mapping-aided (3DMA) GNSS may serve as an urban GNSS solution, since the availability of 3D city models has widely increased. As a result, this study developed a real-time 3DMA GNSS-positioning system based on state-of-the-art 3DMA GNSS algorithms. Shadow matching was integrated with likelihood-based ranging 3DMA GNSS, generating positioning hypothesis candidates. To increase robustness, the 3DMA GNSS solution was then optimized with Doppler measurements using factor graph optimization (FGO) in a loosely-coupled fashion. This study also evaluated positioning performance using an advanced wearable system’s recorded data in New York City. The real-time forward-processed FGO can provide a root-mean-square error (RMSE) of about 21 m. The RMSE drops to 16 m when the data is post-processed with FGO in a combined direction. Overall results show that the proposed loosely-coupled 3DMA FGO algorithm can provide a better and more robust positioning performance for the multi-sensor integration approach used by this wearable for persons with BLV. 
    more » « less
  4. null (Ed.)
    The Georgia Tech Miniature Autonomous Blimp (GT-MAB) needs localization algorithms to navigate to way-points in an indoor environment without leveraging an external motion capture system. Indoor aerial robots often require a motion capture system for localization or employ simultaneous localization and mapping (SLAM) algorithms for navigation. The proposed strategy for GT-MAB localization can be accomplished using lightweight sensors on a weight-constrained platform like the GT-MAB. We train an end-to-end convolutional neural network (CNN) that predicts the horizontal position and heading of the GT-MAB using video collected by an onboard monocular RGB camera. On the other hand, the height of the GT-MAB is estimated from measurements through a time-of-flight (ToF) single-beam laser sensor. The monocular camera and the single-beam laser sensor are sufficient for the localization algorithm to localize the GT-MAB in real time, achieving the averaged 3D positioning errors to be less than 20 cm, and the averaged heading errors to be less than 3 degrees. With the accuracy of our proposed localization method, we are able to use simple proportional-integral-derivative controllers to control the GT-MAB for waypoint navigation. Experimental results on the waypoint following are provided, which demonstrates the use of a CNN as the primary localization method for estimating the pose of an indoor robot that successfully enables navigation to specified waypoints. 
    more » « less
  5. null (Ed.)
    Localizing the camera in a known indoor environment is a key building block for scene mapping, robot navigation, AR, etc. Recent advances estimate the camera pose via optimization over the 2D/3D-3D correspondences established between the coordinates in 2D/3D camera space and 3D world space. Such a mapping is estimated with either a convolution neural network or a decision tree using only the static input image sequence, which makes these approaches vulnerable to dynamic indoor environments that are quite common yet challenging in the real world. To address the aforementioned issues, in this paper, we propose a novel outlier-aware neural tree which bridges the two worlds, deep learning and decision tree approaches. It builds on three important blocks: (a) a hierarchical space partition over the indoor scene to construct the decision tree; (b) a neural routing function, implemented as a deep classification network, employed for better 3D scene understanding; and (c) an outlier rejection module used to filter out dynamic points during the hierarchical routing process. Our proposed algorithm is evaluated on the RIO-10 benchmark developed for camera relocalization in dynamic indoor environments. It achieves robust neural routing through space partitions and outperforms the state-of-the-art approaches by around 30% on camera pose accuracy, while running comparably fast for evaluation. 
    more » « less