Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building attribute extraction that utilizes large-scale vision and language models to mitigate reliance on external annotations. The proposed workflow contains two key components: image-level captioning and segment-level captioning for the building images based on the vocabularies pertinent to structural and civil engineering. These two components generate descriptive captions by computing feature representations of the image and the vocabularies, and facilitating a semantic match between the visual and textual representations. Consequently, our framework offers a promising avenue to enhance AI-driven captioning for building attribute extraction in the structural and civil engineering domains, ultimately reducing reliance on human annotations while bolstering performance and adaptability.
more »
« less
Instance segmentation of soft‐story buildings from street‐view images with semiautomatic annotation
In high seismic risk regions, it is important for city managers and decision makers to create programs to mitigate the risk for buildings. For large cities and regions, a mitigation program relies on accurate information of building stocks, that is, a database of all buildings in the area and their potential structural defects, making them vulnerable to strong ground shaking. Structural defects and vulnerabilities could manifest via the building's appearance. One such example is the soft‐story building—its vertical irregularity is often observable from the facade. This structural type can lead to severe damage or even collapse during moderate or severe earthquakes. Therefore, it is critical to screen large building stock to find these buildings and retrofit them. However, it is usually time‐consuming to screen soft‐story structures by conventional methods. To tackle this issue, we used full image classification to screen them out from street view images in our previous study. However, full image classification has difficulties locating buildings in an image, which leads to unreliable predictions. In this paper, we developed an automated pipeline in which we segment street view images to identify soft‐story buildings. However, annotated data for this purpose is scarce. To tackle this issue, we compiled a dataset of street view images and present a strategy for annotating these images in a semi‐automatic way. The annotated dataset is then used to train an instance segmentation model that can be used to detect all soft‐story buildings from unseen images.
more »
« less
- Award ID(s):
- 2131111
- PAR ID:
- 10472357
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Earthquake Engineering & Structural Dynamics
- Volume:
- 52
- Issue:
- 8
- ISSN:
- 0098-8847
- Page Range / eLocation ID:
- 2520 to 2532
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Image-based localization has been widely used for autonomous vehicles, robotics, augmented reality, etc., and this is carried out by matching a query image taken from a cell phone or vehicle dashcam to a large scale of geo-tagged reference images, such as satellite/aerial images or Google Street Views. However, the problem remains challenging due to the inconsistency between the query images and the large-scale reference datasets regarding various light and weather conditions. To tackle this issue, this work proposes a novel view synthesis framework equipped with deep generative models, which can merge the unique features from the outdated reference dataset with features from the images containing seasonal changes. Our design features a unique scheme to ensure that the synthesized images contain the important features from both reference and patch images, covering seasonable features and minimizing the gap for the image-based localization tasks. The performance evaluation shows that the proposed framework can synthesize the views in various weather and lighting conditions.more » « less
-
Abstract Street view imagery databases such as Google Street View, Mapillary, and Karta View provide great spatial and temporal coverage for many cities globally. Those data, when coupled with appropriate computer vision algorithms, can provide an effective means to analyse aspects of the urban environment at scale. As an effort to enhance current practices in urban flood risk assessment, this project investigates a potential use of street view imagery data to identify building features that indicate buildings’ vulnerability to flooding (e.g., basements and semi-basements). In particular, this paper discusses (1) building features indicating the presence of basement structures, (2) available imagery data sources capturing those features, and (3) computer vision algorithms capable of automatically detecting the features of interest. The paper also reviews existing methods for reconstructing geometry representations of the extracted features from images and potential approaches to account for data quality issues. Preliminary experiments were conducted, which confirmed the usability of the freely available Mapillary images for detecting basement railings as an example type of basement features, as well as geolocating the features.more » « less
-
This paper presents a unified framework to learn to quantify perceptual attributes (e.g., safety, attractiveness) of physical urban environments using crowd-sourced street-view photos without human annotations. The efforts of this work include two folds. First, we collect a large-scale urban image dataset in multiple major cities in U.S.A., which consists of multiple street-view photos for every place. Instead of using subjective annotations as in previous works, which are neither accurate nor consistent, we collect for every place the safety score from government’s crime event records as objective safety indicators. Second, we observe that the place-centric perception task is by nature a multi-instance regression problem since the labels are only available for places (bags), rather than images or image regions (instances). We thus introduce a deep convolutional neural network (CNN) to parameterize the instance-level scoring function, and develop an EM algorithm to alternatively estimate the primary instances (images or image regions) which affect the safety scores and train the proposed network. Our method is capable of localizing interesting images and image regions for each place.We evaluate the proposed method on a newly created dataset and a public dataset. Results with comparisons showed that our method can clearly outperform the alternative perception methods and more importantly, is capable of generating region-level safety scores to facilitate interpretations of the perception process.more » « less
-
null (Ed.)Image data remains an important tool for post-event building assessment and documentation. After each natural hazard event, significant efforts are made by teams of engineers to visit the affected regions and collect useful image data. In general, a global positioning system (GPS) can provide useful spatial information for localizing image data. However, it is challenging to collect such information when images are captured in places where GPS signals are weak or interrupted, such as the indoor spaces of buildings. The inability to document the images’ locations hinders the analysis, organization, and documentation of these images as they lack sufficient spatial context. In this work, we develop a methodology to localize images and link them to locations on a structural drawing. A stream of images can readily be gathered along the path taken through a building using a compact camera. These images may be used to compute a relative location of each image in a 3D point cloud model, which is reconstructed using a visual odometry algorithm. The images may also be used to create local 3D textured models for building-components-of-interest using a structure-from-motion algorithm. A parallel set of images that are collected for building assessment is linked to the image stream using time information. By projecting the point cloud model to the structural drawing, the images can be overlaid onto the drawing, providing clear context information necessary to make use of those images. Additionally, components- or damage-of-interest captured in these images can be reconstructed in 3D, enabling detailed assessments having sufficient geospatial context. The technique is demonstrated by emulating post-event building assessment and data collection in a real building.more » « less
An official website of the United States government

