skip to main content


Search for: All records

Award ID contains: 2131186

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available September 1, 2024
  2. For many lawmakers, energy-efficient buildings have been the main focus in large cities across the United States. Buildings consume the largest amount of energy and produce the highest amounts of greenhouse emissions. This is especially true for New York City (NYC)’s public and private buildings, which alone emit more than two-thirds of the city’s total greenhouse emissions. Therefore, improvements in building energy efficiency have become an essential target to reduce the amount of greenhouse gas emissions and fossil fuel consumption. NYC’s buildings’ historical energy consumption data was used in machine learning models to determine their ENERGY STAR scores for time series analysis and future pre- diction. Machine learning models were used to predict future energy use and answer the question of how to incorporate machine learning for effective decision-making to optimize energy usage within the largest buildings in a city. The results show that grouping buildings by property type, rather than by location, provides better predictions for ENERGY STAR scores. 
    more » « less
    Free, publicly-accessible full text available June 20, 2024
  3. Robles, A. (Ed.)
    Although various navigation apps are available, people who are blind or have low vision (PVIB) still face challenges to locate store entrances due to missing geospatial information in existing map services. Previously, we have developed a crowdsourcing platform to collect storefront accessibility and localization data to address the above challenges. In this paper, we have significantly improved the efficiency of data collection and user engagement in our new AI-enabled Smart DoorFront platform by designing and developing multiple important features, including a gamified credit ranking system, a volunteer contribution estimator, an AI-based pre-labeling function, and an image gallery feature. For achieving these, we integrate a specially designed deep learning model called MultiCLU into the Smart DoorFront. We also introduce an online machine learning mechanism to iteratively train the MultiCLU model, by using newly labeled storefront accessibility objects and their locations in images. Our new DoorFront platform not only significantly improves the efficiency of storefront accessibility data collection, but optimizes user experience. We have conducted interviews with six adults who are blind to better understand their daily travel challenges and their feedback indicated that the storefront accessibility data collected via the DoorFront platform would be very beneficial for them. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  4. Online classes are typically conducted by using video conferencing software such as Zoom, Microsoft Teams, and Google Meet. Research has identified drawbacks of online learning, such as “Zoom fatigue”, characterized by distractions and lack of engagement. This study presents the CUNY Affective and Responsive Virtual Environment (CARVE) Hub, a novel virtual reality hub that uses a facial emotion classification model to generate emojis for affective and informal responsive interaction in a 3D virtual classroom setting. A web-based machine learning model is employed for facial emotion classification, enabling students to communicate four basic emotions live through automated web camera capture in a virtual classroom without activating their cameras. The experiment is conducted in undergraduate classes on both Zoom and CARVE, and the results of a survey indicate that students have a positive perception of interactions in the proposed virtual classroom compared with Zoom. Correlations between automated emojis and interactions are also observed. This study discusses potential explanations for the improved interactions, including a decrease in pressure on students when they are not showing faces. In addition, video panels in traditional remote classrooms may be useful for communication but not for interaction. Students favor features in virtual reality, such as spatial audio and the ability to move around, with collaboration being identified as the most helpful feature. 
    more » « less
  5. This paper presents a mobile-based solution that integrates 3D vision and voice interaction to assist people who are blind or have low vision to explore and interact with their surroundings. The key components of the system are the two 3D vision modules: the 3D object detection module integrates a deep-learning based 2D object detector with ARKit-based point cloud generation, and an interest direction recognition module integrates hand/finger recognition and ARKit-based 3D direction estimation. The integrated system consists of a voice interface, a task scheduler, and an instruction generator. The voice interface contains a customized user request mapping module that maps the user’s input voice into one of the four primary system operation modes (exploration, search, navigation, and settings adjustment). The task scheduler coordinates with two web services that host the two vision modules to allocate resources for computation based on the user request and network connectivity strength. Finally, the instruction generator computes the corresponding instructions based on the user request and results from the two vision modules. The system is capable of running in real time on mobile devices. We have shown preliminary experimental results on the performance of the voice to user request mapping module and the two vision modules. 
    more » « less
  6. Contextual information has been widely used in many computer vision tasks. However, existing approaches design specific contextual information mechanisms for different tasks. In this work, we propose a general context learning and reasoning framework for object detection tasks with three components: local contextual labeling, contextual graph generation and spatial contextual reasoning. With simple user defined parameters, local contextual labeling automatically enlarge the small object labels to include more local contextual information. A Graph Convolutional Network learns over the generated contextual graph to build a semantic space. A general spatial relation is used in spatial contextual reasoning to optimize the detection results. All three components can be easily added and removed from a standard object detector. In addition, our approach also automates the training process to find the optimal combinations of user defined parameters. The general framework can be easily adapted to different tasks. In this paper we compare our framework with a previous multistage context learning framework specifically designed for storefront accessibility detection and a state of the art detector for pedestrian detection. Experimental results on two urban scene datasets demonstrate that our proposed general framework can achieve same performance as the specifically designed multistage framework on storefront accessibility detection, and with improved performance on pedestrian detection over the state of art detector. 
    more » « less
  7. Recovering multi-person 3D poses and shapes with absolute scales from a single RGB image is a challenging task due to the inherent depth and scale ambiguity from a single view. Current works on 3D pose and shape estimation tend to mainly focus on the estimation of the 3D joint locations relative to the root joint , usually defined as the one closest to the shape centroid, in case of humans defined as the pelvis joint. In this paper, we build upon an existing multi-person 3D mesh predictor network, ROMP, to create Absolute-ROMP. By adding absolute root joint localization in the camera coordinate frame, we are able to estimate multi-person 3D poses and shapes with absolute scales from a single RGB image. Such a single-shot approach allows the system to better learn and reason about the inter-person depth relationship, thus improving multi-person 3D estimation. In addition to this end to end network, we also train a CNN and transformer hybrid network, called TransFocal, to predict the f ocal length of the image’s camera. Absolute-ROMP estimates the 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. We then use TransFocal to obtain focal length and get absolute depth information of all joints in the camera coordinate frame. We evaluate Absolute-ROMP on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets. We evaluate TransFocal on dataset created from the Pano360 dataset and both are applicable to in-the-wild images and videos, due to real time performance. 
    more » « less
  8. This paper proposes a computer vision-based workflow that analyses Google 360-degree street views to understand the quality of urban spaces regarding vegetation coverage and accessibility of urban amenities such as benches. Image segmentation methods were utilized to produce an annotated image with the amount of vegetation, sky and street coloration. Two deep learning models were used -- Monodepth2 for depth detection and YoloV5 for object detection -- to create a 360-degree diagram of vegetation and benches at a given location. The automated workflow allows non-expert users like planners, designers, and communities to analyze and evaluate urban environments with Google Street Views. The workflow consists of three components: (1) user interface for location selection; (2) vegetation analysis, bench detection and depth estimation; and (3) visualization of vegetation coverage and amenities. The analysis and visualization could inform better urban design outcomes. 
    more » « less