NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Real-Time 3D Object Detection, Recognition and Presentation Using a Mobile Device for Assistive Navigation

https://doi.org/10.1007/s42979-023-01881-3

Chen, Jin; Zhu, Zhigang (September 2023, SN Computer Science)

Full Text Available
Improving Building Energy Efficiency through Data Analysis

https://doi.org/10.1145/3599733.3600244

Phillip, DiAndra; Chen, Jin; Maksakuli, Fani; Ruci, Arber; Sturdivant, E'edresha; Zhu, Zhigang (June 2023, The 14th ACM International Conference on Future Energy Systems (e-Energy ’23 Companion))

For many lawmakers, energy-efficient buildings have been the main focus in large cities across the United States. Buildings consume the largest amount of energy and produce the highest amounts of greenhouse emissions. This is especially true for New York City (NYC)’s public and private buildings, which alone emit more than two-thirds of the city’s total greenhouse emissions. Therefore, improvements in building energy efficiency have become an essential target to reduce the amount of greenhouse gas emissions and fossil fuel consumption. NYC’s buildings’ historical energy consumption data was used in machine learning models to determine their ENERGY STAR scores for time series analysis and future pre- diction. Machine learning models were used to predict future energy use and answer the question of how to incorporate machine learning for effective decision-making to optimize energy usage within the largest buildings in a city. The results show that grouping buildings by property type, rather than by location, provides better predictions for ENERGY STAR scores.
more » « less
Full Text Available
An AI-enabled Annotation Platform for Storefront Accessibility and Localization

Wang, X.; Liu, J.; Tang, H.; Zhu, Z.; Seiple, W. H. (June 2023, Journal on technology and persons with disabilities)
Robles, A. (Ed.)
Although various navigation apps are available, people who are blind or have low vision (PVIB) still face challenges to locate store entrances due to missing geospatial information in existing map services. Previously, we have developed a crowdsourcing platform to collect storefront accessibility and localization data to address the above challenges. In this paper, we have significantly improved the efficiency of data collection and user engagement in our new AI-enabled Smart DoorFront platform by designing and developing multiple important features, including a gamified credit ranking system, a volunteer contribution estimator, an AI-based pre-labeling function, and an image gallery feature. For achieving these, we integrate a specially designed deep learning model called MultiCLU into the Smart DoorFront. We also introduce an online machine learning mechanism to iteratively train the MultiCLU model, by using newly labeled storefront accessibility objects and their locations in images. Our new DoorFront platform not only significantly improves the efficiency of storefront accessibility data collection, but optimizes user experience. We have conducted interviews with six adults who are blind to better understand their daily travel challenges and their feedback indicated that the storefront accessibility data collected via the DoorFront platform would be very beneficial for them.
more » « less
Full Text Available
Context understanding in computer vision: A survey

https://doi.org/10.1016/j.cviu.2023.103646

Wang, Xuan; Zhu, Zhigang (March 2023, Computer Vision and Image Understanding)

Full Text Available
Exploring an Affective and Responsive Virtual Environment to Improve Remote Learning

https://doi.org/10.3390/virtualworlds2010004

Qi, Jianing; Tang, Hao; Zhu, Zhigang (March 2023, Virtual Worlds)

Online classes are typically conducted by using video conferencing software such as Zoom, Microsoft Teams, and Google Meet. Research has identified drawbacks of online learning, such as “Zoom fatigue”, characterized by distractions and lack of engagement. This study presents the CUNY Affective and Responsive Virtual Environment (CARVE) Hub, a novel virtual reality hub that uses a facial emotion classification model to generate emojis for affective and informal responsive interaction in a 3D virtual classroom setting. A web-based machine learning model is employed for facial emotion classification, enabling students to communicate four basic emotions live through automated web camera capture in a virtual classroom without activating their cameras. The experiment is conducted in undergraduate classes on both Zoom and CARVE, and the results of a survey indicate that students have a positive perception of interactions in the proposed virtual classroom compared with Zoom. Correlations between automated emojis and interactions are also observed. This study discusses potential explanations for the improved interactions, including a decrease in pressure on students when they are not showing faces. In addition, video panels in traditional remote classrooms may be useful for communication but not for interaction. Students favor features in virtual reality, such as spatial audio and the ability to move around, with collaboration being identified as the most helpful feature.
more » « less
Full Text Available
Real-time pedestrian pose estimation, tracking and localization for social distancing

https://doi.org/10.1007/s00138-022-01356-0

Abdulrahman, Bilal; Zhu, Zhigang (January 2023, Machine Vision and Applications)

Full Text Available
An Integrated Mobile Vision System for Enhancing the Interaction of Blind and Low Vision Users with Their Surroundings [An Integrated Mobile Vision System for Enhancing the Interaction of Blind and Low Vision Users with Their Surroundings]

https://doi.org/10.5220/0011984400003497

Chen, Jin; Ramnath, Satesh; Samaroo, Tyron; Maksakuli, Fani; Ruci, Arber; Sturdivant, E’edresha; Zhu, Zhigang (January 2023, Proceedings of the 3rd International Conference on Image Processing and Vision Engineering)

This paper presents a mobile-based solution that integrates 3D vision and voice interaction to assist people who are blind or have low vision to explore and interact with their surroundings. The key components of the system are the two 3D vision modules: the 3D object detection module integrates a deep-learning based 2D object detector with ARKit-based point cloud generation, and an interest direction recognition module integrates hand/finger recognition and ARKit-based 3D direction estimation. The integrated system consists of a voice interface, a task scheduler, and an instruction generator. The voice interface contains a customized user request mapping module that maps the user’s input voice into one of the four primary system operation modes (exploration, search, navigation, and settings adjustment). The task scheduler coordinates with two web services that host the two vision modules to allocate resources for computation based on the user request and network connectivity strength. Finally, the instruction generator computes the corresponding instructions based on the user request and results from the two vision modules. The system is capable of running in real time on mobile devices. We have shown preliminary experimental results on the performance of the voice to user request mapping module and the two vision modules.
more » « less
Full Text Available
Precise indoor localization with 3D facility scan data

https://doi.org/10.1111/mice.12795

Xia, Jiahao; Gong, Jie (August 2022, Computer-Aided Civil and Infrastructure Engineering)

Full Text Available
MultiCLU: Multi-stage Context Learning and Utilization for Storefront Accessibility Detection and Evaluation

https://doi.org/10.1145/3512527.3531361

Wang, Xuan; Chen, Jiajun; Tang, Hao; Zhu, Zhigang (June 2022, ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval)

In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metric also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.
more » « less
Full Text Available
ARMSAINTS: An AR-based Real-time Mobile System for Assistive Indoor Navigation with Target Segmentation

https://doi.org/10.1109/ARSO54254.2022.9802970

Chen, Jin; Ruci, Arber; Sturdivant, E'edresha; Zhu, Zhigang (May 2022, 2022 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO))

This paper proposes an AR-based real-time mobile system for assistive indoor navigation with target segmentation (ARMSAINTS) for both sighted and blind or low-vision (BLV) users to safely explore and navigate in an indoor environment. The solution comprises four major components: graph construction, hybrid modeling, real-time navigation and target segmentation. The system utilizes an automatic graph construction method to generate a graph from a 2D floorplan and the Delaunay triangulation-based localization method to provide precise localization with negligible error. The 3D obstacle detection method integrates the existing capability of AR with a 2D object detector and a semantic target segmentation model to detect and track 3D bounding boxes of obstacles and people to increase BLV safety and understanding when traveling in the indoor environment. The entire system does not require the installation and maintenance of expensive infrastructure, run in real-time on a smartphone, and can easily adapt to environmental changes.
more » « less
Full Text Available

« Prev Next »

Search for: All records