skip to main content

Title: An Integrated Mobile Vision System for Enhancing the Interaction of Blind and Low Vision Users with Their Surroundings [An Integrated Mobile Vision System for Enhancing the Interaction of Blind and Low Vision Users with Their Surroundings]
This paper presents a mobile-based solution that integrates 3D vision and voice interaction to assist people who are blind or have low vision to explore and interact with their surroundings. The key components of the system are the two 3D vision modules: the 3D object detection module integrates a deep-learning based 2D object detector with ARKit-based point cloud generation, and an interest direction recognition module integrates hand/finger recognition and ARKit-based 3D direction estimation. The integrated system consists of a voice interface, a task scheduler, and an instruction generator. The voice interface contains a customized user request mapping module that maps the user’s input voice into one of the four primary system operation modes (exploration, search, navigation, and settings adjustment). The task scheduler coordinates with two web services that host the two vision modules to allocate resources for computation based on the user request and network connectivity strength. Finally, the instruction generator computes the corresponding instructions based on the user request and results from the two vision modules. The system is capable of running in real time on mobile devices. We have shown preliminary experimental results on the performance of the voice to user request mapping module and the two vision modules.  more » « less
Award ID(s):
2131186 1827505 1737533
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 3rd International Conference on Image Processing and Vision Engineering
Page Range / eLocation ID:
180 to 187
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Across a majority of pedestrian detection datasets, it is typically assumed that pedestrians will be standing upright with respect to the image coordinate system. This assumption, however, is not always valid for many vision-equipped mobile platforms such as mobile phones, UAVs or construction vehicles on rugged terrain. In these situations, the motion of the camera can cause images of pedestrians to be captured at extreme angles. This can lead to very poor pedestrian detection performance when using standard pedestrian detectors. To address this issue, we propose a Rotational Rectification Network (R2N) that can be inserted into any CNN-based pedestrian (or object) detector to adapt it to significant changes in camera rotation. The rotational rectification network uses a 2D rotation estimation module that passes rotational information to a spatial transformer network to undistort image features. To enable robust rotation estimation, we propose a Global Polar Pooling (GP-Pooling) operator to capture rotational shifts in convolutional features. Through our experiments, we show how our rotational rectification network can be used to improve the performance of the state-of-the-art pedestrian detector under heavy image rotation by up to 45% 
    more » « less
  2. The ubiquity of millimeter-wave (mmWave) technology could bring through-obstruction imaging to portable, mobile systems. Existing through-obstruction imaging systems rely on Synthetic Aperture Radar (SAR) technique, but emulating the SAR principle on hand-held devices has been challenging. We propose ViSAR, a portable platform that integrates an optical camera and mmWave radar to emulate the SAR principle and enable through-obstruction 3D imaging. ViSAR synchronizes the devices at the software-level and uses the Time Domain Backprojection algorithm to generate vision-augmented mmWave images. We have experimentally evaluated ViSAR by imaging several indoor objects. 
    more » « less
  3. Real-time detection of 3D obstacles and recognition of humans and other objects is essential for blind or low- vision people to travel not only safely and independently but also confidently and interactively, especially in a cluttered indoor environment. Most existing 3D obstacle detection techniques that are widely applied in robotic applications and outdoor environments often require high-end devices to ensure real-time performance. There is a strong need to develop a low-cost and highly efficient technique for 3D obstacle detection and object recognition in indoor environments. This paper proposes an integrated 3D obstacle detection system implemented on a smartphone, by utilizing deep-learning-based pre-trained 2D object detectors and ARKit- based point cloud data acquisition to predict and track the 3D positions of multiple objects (obstacles, humans, and other objects), and then provide alerts to users in real time. The system consists of four modules: 3D obstacle detection, 3D object tracking, 3D object matching, and information filtering. Preliminary tests in a small house setting indicated that this application could reliably detect large obstacles and their 3D positions and sizes in the real world and small obstacles’ positions, without any expensive devices besides an iPhone. 
    more » « less
  4. This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes. 
    more » « less
  5. Ultra-large mesoscopic imaging advances in the cortex open new pathways to develop neuroprosthetics to restore foveal vision in blind patients. Using targeted optogenetic activation, an optical prosthetic can focally stimulate spatially localized lateral geniculate nucleus (LGN) synaptic boutons within the primary visual cortex (V1). If we localize a cluster within a specific hypercolumn’s input layer, we will find that activation of a subset of these boutons is perceptually fungible with the activation of a different subset of boutons from the same hypercolumn input module. By transducing these LGN neurons with light-sensitive proteins, they are now sensitive to light and we can optogenetically stimulate them in a pattern mimicking naturalistic visual input. Optogenetic targeting of these purely glutamatergic inputs is free from unwanted co-activation of inhibitory neurons (a common problem in electrode-based prosthetic devices, which result in diminished contrast perception). We must prosthetically account for rapidly changing cortical activity and gain control, so our system integrates a real-time cortical read-out mechanism to continually assess and provide feedback to modify stimulation levels, just as the natural visual system does. We accomplish this by readingout a multi-colored array of genetically-encoded and transduced bioluminescent calcium responses in V1 neurons. This hyperspectral array of colors can achieve single-cell resolution. By tracking eye movements in the blind patients, we will account for oculomotor effects by adjusting the contemporaneous stimulation of the LGN boutons to mimic the effects of natural vision, including those from eye movements. This system, called the Optogenetic Brain System (OBServ), is designed to function by optimally activating visual responses in V1 from a fully-implantable coplanar emitter array coupled with a video camera and a bioluminescent read-out system. It follows that if we stimulate the LGN input modules in the same pattern as natural vision, the recipient should perceive naturalistic prosthetic vision. As such, the system holds the promise of restoring vision in the blind at the highest attainable acuity, with maximal contrast sensitivity, using an integrated nanophotonic implantable device that receives eye-tracked video input from a head-mounted video camera, using relatively non-invasive prosthetic technology that does not cross the pia mater of the brain. 
    more » « less