Abstract In high seismic risk regions, it is important for city managers and decision makers to create programs to mitigate the risk for buildings. For large cities and regions, a mitigation program relies on accurate information of building stocks, that is, a database of all buildings in the area and their potential structural defects, making them vulnerable to strong ground shaking. Structural defects and vulnerabilities could manifest via the building's appearance. One such example is the soft‐story building—its vertical irregularity is often observable from the facade. This structural type can lead to severe damage or even collapse during moderate or severe earthquakes. Therefore, it is critical to screen large building stock to find these buildings and retrofit them. However, it is usually time‐consuming to screen soft‐story structures by conventional methods. To tackle this issue, we used full image classification to screen them out from street view images in our previous study. However, full image classification has difficulties locating buildings in an image, which leads to unreliable predictions. In this paper, we developed an automated pipeline in which we segment street view images to identify soft‐story buildings. However, annotated data for this purpose is scarce. To tackle this issue, we compiled a dataset of street view images and present a strategy for annotating these images in a semi‐automatic way. The annotated dataset is then used to train an instance segmentation model that can be used to detect all soft‐story buildings from unseen images.
more »
« less
Place-centric Visual Urban Perception with Deep Multi-instance Regression
This paper presents a unified framework to learn to quantify perceptual attributes (e.g., safety, attractiveness) of physical urban environments using crowd-sourced street-view photos without human annotations. The efforts of this work include two folds. First, we collect a large-scale urban image dataset in multiple major cities in U.S.A., which consists of multiple street-view photos for every place. Instead of using subjective annotations as in previous works, which are neither accurate nor consistent, we collect for every place the safety score from government’s crime event records as objective safety indicators. Second, we observe that the place-centric perception task is by nature a multi-instance regression problem since the labels are only available for places (bags), rather than images or image regions (instances). We thus introduce a deep convolutional neural network (CNN) to parameterize the instance-level scoring function, and develop an EM algorithm to alternatively estimate the primary instances (images or image regions) which affect the safety scores and train the proposed network. Our method is capable of localizing interesting images and image regions for each place.We evaluate the proposed method on a newly created dataset and a public dataset. Results with comparisons showed that our method can clearly outperform the alternative perception methods and more importantly, is capable of generating region-level safety scores to facilitate interpretations of the perception process.
more »
« less
- Award ID(s):
- 1657600
- PAR ID:
- 10056962
- Date Published:
- Journal Name:
- ACM Conference on Multimedia
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Many algorithms for virtual tree generation exist, but the visual realism of the 3D models is unknown. This problem is usually addressed by performing limited user studies or by a side-by-side visual comparison. We introduce an automated system for realism assessment of the tree model based on their perception. We conducted a user study in which 4,000 participants compared over one million pairs of images to collect subjective perceptual scores of a large dataset of virtual trees. The scores were used to train two neural-network-based predictors. A view independent ICTreeF uses the tree model's geometric features that are easy to extract from any model. The second is ICTreeI that estimates the perceived visual realism of a tree from its image. Moreover, to provide an insight into the problem, we deduce intrinsic attributes and evaluate which features make trees look like real trees. In particular, we show that branching angles, length of branches, and widths are critical for perceived realism. We also provide three datasets: carefully curated 3D tree geometries and tree skeletons with their perceptual scores, multiple views of the tree geometries with their scores, and a large dataset of images with scores suitable for training deep neural networks.more » « less
-
Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building attribute extraction that utilizes large-scale vision and language models to mitigate reliance on external annotations. The proposed workflow contains two key components: image-level captioning and segment-level captioning for the building images based on the vocabularies pertinent to structural and civil engineering. These two components generate descriptive captions by computing feature representations of the image and the vocabularies, and facilitating a semantic match between the visual and textual representations. Consequently, our framework offers a promising avenue to enhance AI-driven captioning for building attribute extraction in the structural and civil engineering domains, ultimately reducing reliance on human annotations while bolstering performance and adaptability.more » « less
-
Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene. Despite its practical significance, its advancement is overshadowed by Object Detection, which aims to detect objects belonging to some predefined classes. One major reason is that current InsDet datasets are too small in scale by today's standards. For example, the popular InsDet dataset GMU (published in 2016) has only 23 instances, far less than COCO (80 classes), a well-known object detection dataset published in 2014. We are motivated to introduce a new InsDet dataset and protocol. First, we define a realistic setup for InsDet: training data consists of multi-view instance captures, along with diverse scene images allowing synthesizing training images by pasting instance images on them with free box annotations. Second, we release a real-world database, which contains multi-view capture of 100 object instances, and high-resolution (6k\texttimes{} 8k) testing images. Third, we extensively study baseline methods for InsDet on our dataset, analyze their performance and suggest future work. Somewhat surprisingly, using the off-the-shelf class-agnostic segmentation model (Segment Anything Model, SAM) and the self-supervised feature representation DINOv2 performs the best, achieving >10 AP better than end-to-end trained InsDet models that repurpose object detectors (e.g., FasterRCNN and RetinaNet).more » « less
-
Image data plays a pivotal role in the current data-driven era, particularly in applications such as computer vision, object recognition, and facial identification. Google Maps ® stands out as a widely used platform that heavily relies on street view images. To fulfill the pressing need for an effective and distributed mechanism for image data collection, we present a framework that utilizes smart contract technology and open-source robots to gather street-view image sequences. The proposed framework also includes a protocol for maintaining these sequences using a private blockchain capable of retaining different versions of street views while ensuring the integrity of collected data. With this framework, Google Maps ® data can be securely collected, stored, and published on a private blockchain. By conducting tests with actual robots, we demonstrate the feasibility of the framework and its capability to seamlessly upload privately maintained blockchain image sequences to Google Maps ® using the Google Street View ® Publish API.more » « less