skip to main content

Search for: All records

Award ID contains: 1737533

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available August 1, 2023
  2. In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metricmore »also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.« less
    Free, publicly-accessible full text available June 27, 2023
  3. This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes.
    Free, publicly-accessible full text available May 28, 2023
  4. Free, publicly-accessible full text available February 1, 2023
  5. Free, publicly-accessible full text available January 1, 2023
  6. Real-time detection of 3D obstacles and recognition of humans and other objects is essential for blind or low- vision people to travel not only safely and independently but also confidently and interactively, especially in a cluttered indoor environment. Most existing 3D obstacle detection techniques that are widely applied in robotic applications and outdoor environments often require high-end devices to ensure real-time performance. There is a strong need to develop a low-cost and highly efficient technique for 3D obstacle detection and object recognition in indoor environments. This paper proposes an integrated 3D obstacle detection system implemented on a smartphone, by utilizing deep-learning-based pre-trained 2D object detectors and ARKit- based point cloud data acquisition to predict and track the 3D positions of multiple objects (obstacles, humans, and other objects), and then provide alerts to users in real time. The system consists of four modules: 3D obstacle detection, 3D object tracking, 3D object matching, and information filtering. Preliminary tests in a small house setting indicated that this application could reliably detect large obstacles and their 3D positions and sizes in the real world and small obstacles’ positions, without any expensive devices besides an iPhone.
    Free, publicly-accessible full text available January 1, 2023
  7. Gatherings of thousands to millions of people frequently occur for an enormous variety of educational, social, sporting, and political events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks may not be the most effective one. We propose an alternative inverse k-nearest neighbor (i[Formula: see text]NN) map mechanism that, even when used directly in existing state-of-the-art network structures, shows superior performance. We also provide new network architecture mechanisms that we demonstrate in our own MUD-i[Formula: see text]NN network architecture, which uses multi-scale drop-in replacement upsampling via transposed convolutions to take full advantage of the provided i[Formula: see text]NN labeling. This upsampling combined with the i[Formula: see text]NN maps further improves crowd counting accuracy. We further analyze several variations of the i[Formula: see text]NN labeling mechanism, which apply transformations on the [Formula: see text]NN measure before generating the map, in order to consider the impact of camera perspective views, image resolutions, and the changing rates of the mapping functions. To alleviate the effects of crowd density changes in each image, we alsomore »introduce an attenuation mechanism in the i[Formula: see text]NN mapping. Experimentally, we show that inverse square root [Formula: see text]NN map variation (iR[Formula: see text]NN) provides the best performance. Discussions are provided on computational complexity, label resolutions, the gains in mapping and upsampling, and details of critical cases such as various crowd counts, uneven crowd densities, and crowd occlusions.« less
    Free, publicly-accessible full text available December 30, 2022
  8. Scene reconstruction using Monodepth2 (Monocular Depth Inference) which provides depth maps from a single RGB camera, the outputs are filled with noise and inconsistencies. Instance segmentation using a Mask R-CNN (Region Based Convolution Neural Networks) deep model can provide object segmentation results in 2D but lacks 3D information. In this paper we propose to integrate the results of Instance segmentation via Mask R-CNN’s, CAD model Car Shape Alignment, and depth from Monodepth2 together with classical dynamic vision techniques to create a High-level Semantic Model with separability, robustness, consistency and saliency. The model is useful for both virtualized rendering, semantic augmented reality and automatic driving. Experimental results are provided to validate the approach.
  9. Building an annotated damage image database is the first step to support AI-assisted hurricane impact analysis. Up to now, annotated datasets for model training are insufficient at a local level despite abundant raw data that have been collected for decades. This paper provides a systematic approach for establishing an annotated hurricane-damaged building image database to support AI-assisted damage assessment and analysis. Optimal rectilinear images were generated from panoramic images collected from Hurricane Harvey, Texas 2017. Then, deep learning models, including Amazon Web Service (AWS) Rekognition and Mask R-CNN (Region Based Convolutional Neural Networks), were retrained on the data to develop a pipeline for building detection and structural component extraction. A web-based dashboard was developed for building data management and processed image visualization along with detected structural components and their damage ratings. The proposed AI-assisted labeling tool and trained models can intelligently and rapidly assist potential users such as hazard researchers, practitioners, and government agencies on natural disaster damage management.
  10. M. Hadwiger, M. Larsen (Ed.)
    In this work, we present Unity Point-Cloud Interactive Core, a novel interactive point cloud rendering pipeline for the Unity Development Platform. The goal of the proposed pipeline is to expedite the development process for point cloud applications by encapsulating the rendering process as a standalone component, while maintaining flexibility through an implementable interface. The proposed pipeline allows for rendering arbitrarily large point clouds with improved performance and visual quality. First, a novel dynamic batching scheme is proposed to address the adaptive point sizing problem for level-of-detail (LOD) point cloud structures. Then, an approximate rendering algorithm is proposed to reduce overdraw by minimizing the overall number of fragment operations through an intermediate occlusion culling pass. For the purpose of analysis, the visual quality of renderings is quantified and measured by comparing against a high-quality baseline. In the experiments, the proposed pipeline maintains above 90 FPS for a 20 million point budget while achieving greater than 90% visual quality during interaction when rendering a point-cloud with more than 20 billion points.