skip to main content

Search for: All records

Creators/Authors contains: "Tang, Hao"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available October 1, 2023
  2. In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metricmore »also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.« less
    Free, publicly-accessible full text available June 27, 2023
  3. Gatherings of thousands to millions of people frequently occur for an enormous variety of educational, social, sporting, and political events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks may not be the most effective one. We propose an alternative inverse k-nearest neighbor (i[Formula: see text]NN) map mechanism that, even when used directly in existing state-of-the-art network structures, shows superior performance. We also provide new network architecture mechanisms that we demonstrate in our own MUD-i[Formula: see text]NN network architecture, which uses multi-scale drop-in replacement upsampling via transposed convolutions to take full advantage of the provided i[Formula: see text]NN labeling. This upsampling combined with the i[Formula: see text]NN maps further improves crowd counting accuracy. We further analyze several variations of the i[Formula: see text]NN labeling mechanism, which apply transformations on the [Formula: see text]NN measure before generating the map, in order to consider the impact of camera perspective views, image resolutions, and the changing rates of the mapping functions. To alleviate the effects of crowd density changes in each image, we alsomore »introduce an attenuation mechanism in the i[Formula: see text]NN mapping. Experimentally, we show that inverse square root [Formula: see text]NN map variation (iR[Formula: see text]NN) provides the best performance. Discussions are provided on computational complexity, label resolutions, the gains in mapping and upsampling, and details of critical cases such as various crowd counts, uneven crowd densities, and crowd occlusions.« less
    Free, publicly-accessible full text available December 30, 2022
  4. Free, publicly-accessible full text available January 1, 2023
  5. Though virtual reality (VR) has been advanced to certain levels of maturity in recent years, the general public, especially the population of the blind and visually impaired (BVI), still cannot enjoy the benefit provided by VR. Current VR accessibility applications have been developed either on expensive head-mounted displays or with extra accessories and mechanisms, which are either not accessible or inconvenient for BVI individuals. In this paper, we present a mobile VR app that enables BVI users to access a virtual environment on an iPhone in order to build their skills of perception and recognition of the virtual environment and the virtual objects in the environment. The app uses the iPhone on a selfie stick to simulate a long cane in VR, and applies Augmented Reality (AR) techniques to track the iPhone’s real-time poses in an empty space of the real world, which is then synchronized to the long cane in the VR environment. Due to the use of mixed reality (the integration of VR & AR), we call it the Mixed Reality cane (MR Cane), which provides BVI users auditory and vibrotactile feedback whenever the virtual cane comes in contact with objects in VR. Thus, the MR Cane allowsmore »BVI individuals to interact with the virtual objects and identify approximate sizes and locations of the objects in the virtual environment. We performed preliminary user studies with blind-folded participants to investigate the effectiveness of the proposed mobile approach and the results indicate that the proposed MR Cane could be effective to help BVI individuals in understanding the interaction with virtual objects and exploring 3D virtual environments. The MR Cane concept can be extended to new applications of navigation, training and entertainment for BVI individuals without more significant efforts.« less
  6. The iASSIST is an iPhone-based assistive sensor solution for independent and safe travel for people who are blind or visually impaired, or those who simply face challenges in navigating an unfamiliar indoor environment. The solution integrates information of Bluetooth beacons, data connectivity, visual models, and user preferences. Hybrid models of interiors are created in a modeling stage with these multimodal data, collected, and mapped to the floor plan as the modeler walks through the building. Client-server architecture allows scaling to large areas by lazy-loading models according to beacon signals and/or adjacent region proximity. During the navigation stage, a user with the navigation app is localized within the floor plan, using visual, connectivity, and user preference data, along an optimal route to their destination. User interfaces for both modeling and navigation use multimedia channels, including visual, audio, and haptic feedback for targeted users. The design of human subject test experiments is also described, in addition to some preliminary experimental results.