skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2131186

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 8, 2026
  2. Free, publicly-accessible full text available February 26, 2026
  3. One of the grand challenges in computer vision is to recover 3D poses and shapes of multiple human bodies with absolute scales from a single RGB image. The challenge stems from the inherent depth and scale ambiguity from a single view. The state of the art on 3D human pose and shape estimation mainly focuses on estimating the 3D joint locations relative to the root joint, defined as the pelvis joint. In this paper, a novel approach called Absolute-ROMP is proposed, which builds upon a one-stage multi-person 3D mesh predictor network, ROMP, to estimate multi-person 3D poses and shapes, but with absolute scales from a single RGB image. To achieve this, we introduce absolute root joint localization in the camera coordinate frame, which enables the estimation of 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. Moreover, a CNN and transformer hybrid network, called TransFocal, is proposed to predict the focal length of the image’s camera. This enables Absolute-ROMP to obtain absolute depth information of all joints in the camera coordinate frame, further improving the accuracy of our proposed method. The Absolute-ROMP is evaluated on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets, and TransFocal is evaluated on a dataset created from the Pano360 dataset. Our proposed approach achieves state-of-the-art results on these tasks, outperforming existing methods or has competitive performance. Due to its real-time performance, our method is applicable to in-the-wild images and videos. 
    more » « less
  4. Navigating safely and independently presents considerable challenges for people who are blind or have low vision (BLV), as it re- quires a comprehensive understanding of their neighborhood environments. Our user study reveals that understanding sidewalk materials and objects on the sidewalks plays a crucial role in navigation tasks. This paper presents a pioneering study in the field of navigational aids for BLV individuals. We investigate the feasibility of using auditory data, specifically the sounds produced by cane tips against various sidewalk materials, to achieve material identification. Our approach utilizes ma- chine learning and deep learning techniques to classify sidewalk materials solely based on audio cues, marking a significant step towards empowering BLV individuals with greater autonomy in their navigation. This study contributes in two major ways: Firstly, a lightweight and practical method is developed for volunteers or BLV individuals to autonomously collect auditory data of sidewalk materials using a microphone-equipped white cane. This innovative approach transforms routine cane usage into an effective data-collection tool. Secondly, a deep learning-based classifier algorithm is designed that leverages a dual architecture to enhance audio feature extraction. This includes a pre-trained Convolutional Neural Network (CNN) for regional feature extraction from two-dimensional Mel-spectrograms and a booster module for global feature enrichment. Experimental results indicate that the optimal model achieves an accuracy of 80.96% using audio data only, which can effectively recognize sidewalk materials. 
    more » « less
  5. Navigating safely and independently presents considerable challenges for people who are blind or have low vision (BLV), as it re- quires a comprehensive understanding of their neighborhood environments. Our user study reveals that understanding sidewalk materials and objects on the sidewalks plays a crucial role in navigation tasks. This paper presents a pioneering study in the field of navigational aids for BLV individuals. We investigate the feasibility of using auditory data, specifically the sounds produced by cane tips against various sidewalk materials, to achieve material identification. Our approach utilizes ma- chine learning and deep learning techniques to classify sidewalk materials solely based on audio cues, marking a significant step towards empowering BLV individuals with greater autonomy in their navigation. This study contributes in two major ways: Firstly, a lightweight and practical method is developed for volunteers or BLV individuals to autonomously collect auditory data of sidewalk materials using a microphone-equipped white cane. This innovative approach transforms routine cane usage into an effective data-collection tool. Secondly, a deep learning-based classifier algorithm is designed that leverages a dual architecture to enhance audio feature extraction. This includes a pre-trained Convolutional Neural Network (CNN) for regional feature extraction from two-dimensional Mel-spectrograms and a booster module for global feature enrichment. 
    more » « less