skip to main content

Search for: All records

Creators/Authors contains: "Zhu, Zhigang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metricmore »also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.« less
    Free, publicly-accessible full text available June 27, 2023
  2. This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes.
    Free, publicly-accessible full text available May 28, 2023
  3. Free, publicly-accessible full text available February 1, 2023
  4. Real-time detection of 3D obstacles and recognition of humans and other objects is essential for blind or low- vision people to travel not only safely and independently but also confidently and interactively, especially in a cluttered indoor environment. Most existing 3D obstacle detection techniques that are widely applied in robotic applications and outdoor environments often require high-end devices to ensure real-time performance. There is a strong need to develop a low-cost and highly efficient technique for 3D obstacle detection and object recognition in indoor environments. This paper proposes an integrated 3D obstacle detection system implemented on a smartphone, by utilizing deep-learning-based pre-trained 2D object detectors and ARKit- based point cloud data acquisition to predict and track the 3D positions of multiple objects (obstacles, humans, and other objects), and then provide alerts to users in real time. The system consists of four modules: 3D obstacle detection, 3D object tracking, 3D object matching, and information filtering. Preliminary tests in a small house setting indicated that this application could reliably detect large obstacles and their 3D positions and sizes in the real world and small obstacles’ positions, without any expensive devices besides an iPhone.
    Free, publicly-accessible full text available January 1, 2023
  5. Gatherings of thousands to millions of people frequently occur for an enormous variety of educational, social, sporting, and political events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks may not be the most effective one. We propose an alternative inverse k-nearest neighbor (i[Formula: see text]NN) map mechanism that, even when used directly in existing state-of-the-art network structures, shows superior performance. We also provide new network architecture mechanisms that we demonstrate in our own MUD-i[Formula: see text]NN network architecture, which uses multi-scale drop-in replacement upsampling via transposed convolutions to take full advantage of the provided i[Formula: see text]NN labeling. This upsampling combined with the i[Formula: see text]NN maps further improves crowd counting accuracy. We further analyze several variations of the i[Formula: see text]NN labeling mechanism, which apply transformations on the [Formula: see text]NN measure before generating the map, in order to consider the impact of camera perspective views, image resolutions, and the changing rates of the mapping functions. To alleviate the effects of crowd density changes in each image, we alsomore »introduce an attenuation mechanism in the i[Formula: see text]NN mapping. Experimentally, we show that inverse square root [Formula: see text]NN map variation (iR[Formula: see text]NN) provides the best performance. Discussions are provided on computational complexity, label resolutions, the gains in mapping and upsampling, and details of critical cases such as various crowd counts, uneven crowd densities, and crowd occlusions.« less
    Free, publicly-accessible full text available December 30, 2022
  6. Scene reconstruction using Monodepth2 (Monocular Depth Inference) which provides depth maps from a single RGB camera, the outputs are filled with noise and inconsistencies. Instance segmentation using a Mask R-CNN (Region Based Convolution Neural Networks) deep model can provide object segmentation results in 2D but lacks 3D information. In this paper we propose to integrate the results of Instance segmentation via Mask R-CNN’s, CAD model Car Shape Alignment, and depth from Monodepth2 together with classical dynamic vision techniques to create a High-level Semantic Model with separability, robustness, consistency and saliency. The model is useful for both virtualized rendering, semantic augmented reality and automatic driving. Experimental results are provided to validate the approach.
  7. Building an annotated damage image database is the first step to support AI-assisted hurricane impact analysis. Up to now, annotated datasets for model training are insufficient at a local level despite abundant raw data that have been collected for decades. This paper provides a systematic approach for establishing an annotated hurricane-damaged building image database to support AI-assisted damage assessment and analysis. Optimal rectilinear images were generated from panoramic images collected from Hurricane Harvey, Texas 2017. Then, deep learning models, including Amazon Web Service (AWS) Rekognition and Mask R-CNN (Region Based Convolutional Neural Networks), were retrained on the data to develop a pipeline for building detection and structural component extraction. A web-based dashboard was developed for building data management and processed image visualization along with detected structural components and their damage ratings. The proposed AI-assisted labeling tool and trained models can intelligently and rapidly assist potential users such as hazard researchers, practitioners, and government agencies on natural disaster damage management.
  8. Manually annotating complex scene point cloud datasets is both costly and error-prone. To reduce the reliance on labeled data, we propose a snapshot-based self-supervised method to enable direct feature learning on the unlabeled point cloud of a complex 3D scene. A snapshot is defined as a collection of points sampled from the point cloud scene. It could be a real view of a local 3D scan directly captured from the real scene, or a virtual view of such from a large 3D point cloud dataset. First the snapshots go through a self-supervised pipeline including both part contrasting and snapshot clustering for feature learning. Then a weakly-supervised approach is implemented by training a standard SVM classifier on the learned features with a small fraction of labeled data. We evaluate the weakly-supervised approach for point cloud classification by using varying numbers of labeled data and study the minimal numbers of labeled data for a successful classification. Experiments are conducted on three public point cloud datasets, and the results have shown that our method is capable of learning effective features from the complex scene data without any labels.
  9. Though virtual reality (VR) has been advanced to certain levels of maturity in recent years, the general public, especially the population of the blind and visually impaired (BVI), still cannot enjoy the benefit provided by VR. Current VR accessibility applications have been developed either on expensive head-mounted displays or with extra accessories and mechanisms, which are either not accessible or inconvenient for BVI individuals. In this paper, we present a mobile VR app that enables BVI users to access a virtual environment on an iPhone in order to build their skills of perception and recognition of the virtual environment and the virtual objects in the environment. The app uses the iPhone on a selfie stick to simulate a long cane in VR, and applies Augmented Reality (AR) techniques to track the iPhone’s real-time poses in an empty space of the real world, which is then synchronized to the long cane in the VR environment. Due to the use of mixed reality (the integration of VR & AR), we call it the Mixed Reality cane (MR Cane), which provides BVI users auditory and vibrotactile feedback whenever the virtual cane comes in contact with objects in VR. Thus, the MR Cane allowsmore »BVI individuals to interact with the virtual objects and identify approximate sizes and locations of the objects in the virtual environment. We performed preliminary user studies with blind-folded participants to investigate the effectiveness of the proposed mobile approach and the results indicate that the proposed MR Cane could be effective to help BVI individuals in understanding the interaction with virtual objects and exploring 3D virtual environments. The MR Cane concept can be extended to new applications of navigation, training and entertainment for BVI individuals without more significant efforts.« less
  10. This paper describes the interface and testing of an indoor navigation app - ASSIST - that guides blind & visually impaired (BVI) individuals through an indoor environment with high accuracy while augmenting their understanding of the surrounding environment. ASSIST features personalized interfaces by considering the unique experiences that BVI individuals have in indoor wayfinding and offers multiple levels of multimodal feedback. After an overview of the technical approach and implementation of the first prototype of the ASSIST system, the results of two pilot studies performed with BVI individuals are presented – a performance study to collect data on mobility (walking speed, collisions, and navigation errors) while using the app, and a usability study to collect user evaluation data on the perceived helpfulness, safety, ease-of-use, and overall experience while using the app. Our studies show that ASSIST is useful in providing users with navigational guidance, improving their efficiency and (more significantly) their safety and accuracy in wayfinding indoors. Findings and user feed-back from the studies confirm some of the previous results, while also providing some new insights into the creation of such an app, including the use of customized user interfaces and expanding the types of information provided.