NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Absolute-ROMP: Recovering Multi-person 3D Poses and Shapes with Absolute Scales from a Single RGB Image

Abdulrahman, B; Zhu, Z (August 2024, springer)

One of the grand challenges in computer vision is to recover 3D poses and shapes of multiple human bodies with absolute scales from a single RGB image. The challenge stems from the inherent depth and scale ambiguity from a single view. The state of the art on 3D human pose and shape estimation mainly focuses on estimating the 3D joint locations relative to the root joint, defined as the pelvis joint. In this paper, a novel approach called Absolute-ROMP is proposed, which builds upon a one-stage multi-person 3D mesh predictor network, ROMP, to estimate multi-person 3D poses and shapes, but with absolute scales from a single RGB image. To achieve this, we introduce absolute root joint localization in the camera coordinate frame, which enables the estimation of 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. Moreover, a CNN and transformer hybrid network, called TransFocal, is proposed to predict the focal length of the image’s camera. This enables Absolute-ROMP to obtain absolute depth information of all joints in the camera coordinate frame, further improving the accuracy of our proposed method. The Absolute-ROMP is evaluated on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets, and TransFocal is evaluated on a dataset created from the Pano360 dataset. Our proposed approach achieves state-of-the-art results on these tasks, outperforming existing methods or has competitive performance. Due to its real-time performance, our method is applicable to in-the-wild images and videos.
more » « less
Full Text Available
GMC: A general framework of multi-stage context learning and utilization for visual detection tasks

https://doi.org/10.1016/j.cviu.2024.103944

Wang, Xuan; Tang, Hao; Zhu, Zhigang (April 2024, Computer Vision and Image Understanding)

Full Text Available
Surveying Sidewalk Materials for and by Individuals Who Are Blind or Have Low Vision: Audio Data Collection and Classification

Liu, J; Lam, W P; Zhu, Z; Tang, H (March 2024, International Conference on SMART MULTIMEDIA)

Navigating safely and independently presents considerable challenges for people who are blind or have low vision (BLV), as it re- quires a comprehensive understanding of their neighborhood environments. Our user study reveals that understanding sidewalk materials and objects on the sidewalks plays a crucial role in navigation tasks. This paper presents a pioneering study in the field of navigational aids for BLV individuals. We investigate the feasibility of using auditory data, specifically the sounds produced by cane tips against various sidewalk materials, to achieve material identification. Our approach utilizes ma- chine learning and deep learning techniques to classify sidewalk materials solely based on audio cues, marking a significant step towards empowering BLV individuals with greater autonomy in their navigation. This study contributes in two major ways: Firstly, a lightweight and practical method is developed for volunteers or BLV individuals to autonomously collect auditory data of sidewalk materials using a microphone-equipped white cane. This innovative approach transforms routine cane usage into an effective data-collection tool. Secondly, a deep learning-based classifier algorithm is designed that leverages a dual architecture to enhance audio feature extraction. This includes a pre-trained Convolutional Neural Network (CNN) for regional feature extraction from two-dimensional Mel-spectrograms and a booster module for global feature enrichment.
more » « less
Full Text Available
Creating and Analyzing a Multimedia Dataset for Building Energy Efficiency Estimation

Chen, G Q; Kashyap, N; Zhang, Z; Kucheva, Y; Bobker, M; Zhu, Z (March 2024, International Conference on SMART MULTIMEDIA)

This paper presents the results of a research that created and analyzed a Multimedia dataset for building energy efficiency estimation. First a new Multimedia Building Energy Efficiency (MMBEE) dataset was created from publicly available data. This work then explored the use of the window-to-wall ratio (WWR) information from building facade images and integrated it with traditional tabular data to create new training data, in order to predict building energy efficiency measures. Finally, we discuss potential applications and future research directions in using the MMBEE dataset for building energy efficiency prediction. Throughout the paper, a number of important processes and analyses were performed, which include feature selection, data correlation analysis, WWR extraction, and comparison of deep network and random forest models in building energy efficiency estimation. From this first attempt at using the Multimedia dataset for building energy efficiency estimation, we found the performances of deep models were better than traditional models such as random forest. We also found that there was an optimal point of what features shall be used for the prediction. Nonetheless, the incorporation of the current WWR estimation results did not yield the anticipated enhancement in estimation performance. Subsequently, a comprehensive investigation was conducted to ascertain potential contributing factors, and several avenues for future research were identified to enhance the predictive utility of the WWR feature.
more » « less
Full Text Available
Creating and Analyzing a Multimedia Dataset for Building Energy Efficiency Estimation

Chen, G Q; Kashyap, N; Zhang, Z; Kucheva, Y; Bobker, M; Zhu, Z (March 2024, International Conference on SMART MULTIMEDIA)

This paper presents the results of a research that created and analyzed a Multimedia dataset for building energy efficiency estimation. First a new Multimedia Building Energy Efficiency (MMBEE) dataset was created from publicly available data. This work then explored the use of the window-to-wall ratio (WWR) information from building facade images and integrated it with traditional tabular data to create new training data, in order to predict building energy efficiency measures. Finally, we discuss potential applications and future research directions in using the MMBEE dataset for building energy efficiency prediction. Throughout the paper, a number of important processes and analyses were performed, which include feature selection, data correlation analysis, WWR extraction, and comparison of deep network and random forest models in building energy efficiency estimation. From this first attempt at using the Multimedia dataset for building energy efficiency estimation, we found the performances of deep models were better than traditional models such as random forest. We also found that there was an optimal point of what features shall be used for the prediction. Nonetheless, the incorporation of the current WWR estimation results did not yield the anticipated enhancement in estimation performance. Subsequently, a comprehensive investigation was conducted to ascertain potential contributing factors, and several avenues for future research were identified to enhance the predictive utility of the WWR feature.
more » « less
Full Text Available
Surveying Sidewalk Materials for and by Individuals Who Are Blind or Have Low Vision: Audio Data Collection and Classification

Liu, J; Lam, W P; Zhu, Z; Tang, H (March 2024, International Conference on SMART MULTIMEDIA)

Navigating safely and independently presents considerable challenges for people who are blind or have low vision (BLV), as it re- quires a comprehensive understanding of their neighborhood environments. Our user study reveals that understanding sidewalk materials and objects on the sidewalks plays a crucial role in navigation tasks. This paper presents a pioneering study in the field of navigational aids for BLV individuals. We investigate the feasibility of using auditory data, specifically the sounds produced by cane tips against various sidewalk materials, to achieve material identification. Our approach utilizes ma- chine learning and deep learning techniques to classify sidewalk materials solely based on audio cues, marking a significant step towards empowering BLV individuals with greater autonomy in their navigation. This study contributes in two major ways: Firstly, a lightweight and practical method is developed for volunteers or BLV individuals to autonomously collect auditory data of sidewalk materials using a microphone-equipped white cane. This innovative approach transforms routine cane usage into an effective data-collection tool. Secondly, a deep learning-based classifier algorithm is designed that leverages a dual architecture to enhance audio feature extraction. This includes a pre-trained Convolutional Neural Network (CNN) for regional feature extraction from two-dimensional Mel-spectrograms and a booster module for global feature enrichment. Experimental results indicate that the optimal model achieves an accuracy of 80.96% using audio data only, which can effectively recognize sidewalk materials.
more » « less
Full Text Available
Pedestrian-Accessible Infrastructure Inventory: Enabling and Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data for All Pedestrian Types

https://doi.org/10.3390/jimaging10030052

Xia, Jiahao; Gong, Gavin; Liu, Jiawei; Zhu, Zhigang; Tang, Hao (March 2024, Journal of Imaging)

In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data, including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory, which goes beyond the traditional transportation elements to include street furniture objects that are important for accessibility but are often omitted from the traditional definition. Our contributions lie in producing the necessary knowledge to answer the following three questions. First, how can mobile LiDAR technology be leveraged to produce comprehensive pedestrian-accessible infrastructure inventory? Second, which data representation can facilitate zero-shot segmentation of infrastructure objects with SAM? Third, how well does the SAM-based method perform on segmenting pedestrian infrastructure objects? Our proposed method is designed to efficiently create pedestrian-accessible infrastructure inventory through the zero-shot segmentation of multi-sourced geospatial datasets. Through addressing three research questions, we show how the multi-mode data should be prepared, what data representation works best for what asset features, and how SAM performs on these data presentations. Our findings indicate that street-view images generated from mobile LiDAR point-cloud data, when paired with satellite imagery data, can work efficiently with SAM to create a scalable pedestrian infrastructure inventory approach with immediate benefits to GIS professionals, city managers, transportation owners, and walkers, especially those with travel-limiting disabilities, such as individuals who are blind, have low vision, or experience mobility disabilities.
more » « less
Full Text Available
MAC-You-Vision: A Progressive Training Application for Patients with Age-Related Macular Degeneration

https://doi.org/10.1109/URTC60662.2023.10534931

Tepoxtecatl, Alan; Yang, Crystal; Sehaumpai, Max; Seiple, William H; Zhu, Zhigang (October 2023, IEEE)

Full Text Available
Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

https://doi.org/10.1007/s42979-023-01881-3

Chen, Jin; Zhu, Zhigang (September 2023, SN Computer Science)

Full Text Available
Improving Building Energy Efficiency through Data Analysis

https://doi.org/10.1145/3599733.3600244

Phillip, DiAndra; Chen, Jin; Maksakuli, Fani; Ruci, Arber; Sturdivant, E'edresha; Zhu, Zhigang (June 2023, The 14th ACM International Conference on Future Energy Systems (e-Energy ’23 Companion))

For many lawmakers, energy-efficient buildings have been the main focus in large cities across the United States. Buildings consume the largest amount of energy and produce the highest amounts of greenhouse emissions. This is especially true for New York City (NYC)’s public and private buildings, which alone emit more than two-thirds of the city’s total greenhouse emissions. Therefore, improvements in building energy efficiency have become an essential target to reduce the amount of greenhouse gas emissions and fossil fuel consumption. NYC’s buildings’ historical energy consumption data was used in machine learning models to determine their ENERGY STAR scores for time series analysis and future pre- diction. Machine learning models were used to predict future energy use and answer the question of how to incorporate machine learning for effective decision-making to optimize energy usage within the largest buildings in a city. The results show that grouping buildings by property type, rather than by location, provides better predictions for ENERGY STAR scores.
more » « less
Full Text Available

« Prev Next »

Search for: All records