skip to main content


Title: End-to-end Deep Learning Methods for Automated Damage Detection in Extreme Events at Various Scales
Robust Mask R-CNN (Mask Regional Convolutional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earthquakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With data augmentation and parameters fine-tuning, Path Aggregation Network (PANet) with spatial attention mechanisms and High- resolution Network (HRNet) are introduced into Mask R-CNNs. The tests on three public datasets with low- or high-resolution images demonstrate that the proposed methods can achieve a big improvement over alternative networks, so the proposed method may be sufficient for crack detection for a variety of scales in real applications.  more » « less
Award ID(s):
2036193
NSF-PAR ID:
10352789
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
25th International Conference on Pattern Recognition (ICPR)
Page Range / eLocation ID:
6640 to 6647
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we develop and implement end-to-end deep learning approaches to automatically detect two important types of structural failures, cracks and spalling, of buildings and bridges in extreme events such as major earthquakes. A total of 2,229 images were annotated, and are used to train and validate three newly developed Mask Regional Convolutional Neural Networks (Mask R-CNNs). In addition, three sets of public images for different disasters were used to test the accuracy of these models. For detecting and marking these two types of structural failures, one of proposed methods can achieve an accuracy of 67.6% and 81.1%, respectively, on low- and high-resolution images collected from field investigations. The results demonstrate that it is feasible to use the proposed end-to-end method for automatically locating and segmenting the damage using 2D images which can help human experts in cases of disasters. 
    more » « less
  2. Cracks of civil infrastructures, including bridges, dams, roads, and skyscrapers, potentially reduce local stiffness and cause material discontinuities, so as to lose their designed functions and threaten public safety. This inevitable process signifier urgent maintenance issues. Early detection can take preventive measures to prevent damage and possible failure. With the increasing size of image data, machine/deep learning based method have become an important branch in detecting cracks from images. This study is to build an automatic crack detector using the state-of-the-art technique referred to as Mask Regional Convolution Neural Network (R-CNN), which is kind of deep learning. Mask R-CNN technique is a recently proposed algorithm not only for object detection and object localization but also for object instance segmentation of natural images. It is found that the built crack detector is able to perform highly effective and efficient automatic segmentation of a wide range of images of cracks. In addition, this proposed automatic detector could work on videos as well; indicating that this detector based on Mask R-CNN provides a robust and feasible ability on detecting cracks exist and their shapes in real time on-site. 
    more » « less
  3. Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle1 to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios2. 
    more » « less
  4. State-of-the-art deep learning technology has been successfully applied to relatively small selected areas of very high spatial resolution (0.15 and 0.25 m) optical aerial imagery acquired by a fixed-wing aircraft to automatically characterize ice-wedge polygons (IWPs) in the Arctic tundra. However, any mapping of IWPs at regional to continental scales requires images acquired on different sensor platforms (particularly satellite) and a refined understanding of the performance stability of the method across sensor platforms through reliable evaluation assessments. In this study, we examined the transferability of a deep learning Mask Region-Based Convolutional Neural Network (R-CNN) model for mapping IWPs in satellite remote sensing imagery (~0.5 m) covering 272 km2 and unmanned aerial vehicle (UAV) (0.02 m) imagery covering 0.32 km2. Multi-spectral images were obtained from the WorldView-2 satellite sensor and pan-sharpened to ~0.5 m, and a 20 mp CMOS sensor camera onboard a UAV, respectively. The training dataset included 25,489 and 6022 manually delineated IWPs from satellite and fixed-wing aircraft aerial imagery near the Arctic Coastal Plain, northern Alaska. Quantitative assessments showed that individual IWPs were correctly detected at up to 72% and 70%, and delineated at up to 73% and 68% F1 score accuracy levels for satellite and UAV images, respectively. Expert-based qualitative assessments showed that IWPs were correctly detected at good (40–60%) and excellent (80–100%) accuracy levels for satellite and UAV images, respectively, and delineated at excellent (80–100%) level for both images. We found that (1) regardless of spatial resolution and spectral bands, the deep learning Mask R-CNN model effectively mapped IWPs in both remote sensing satellite and UAV images; (2) the model achieved a better accuracy in detection with finer image resolution, such as UAV imagery, yet a better accuracy in delineation with coarser image resolution, such as satellite imagery; (3) increasing the number of training data with different resolutions between the training and actual application imagery does not necessarily result in better performance of the Mask R-CNN in IWPs mapping; (4) and overall, the model underestimates the total number of IWPs particularly in terms of disjoint/incomplete IWPs. 
    more » « less
  5. Cloud detection is an inextricable pre-processing step in remote sensing image analysis workflows. Most of the traditional rule-based and machine-learning-based algorithms utilize low-level features of the clouds and classify individual cloud pixels based on their spectral signatures. Cloud detection using such approaches can be challenging due to a multitude of factors including harsh lighting conditions, the presence of thin clouds, the context of surrounding pixels, and complex spatial patterns. In recent studies, deep convolutional neural networks (CNNs) have shown outstanding results in the computer vision domain. These methods are practiced for better capturing the texture, shape as well as context of images. In this study, we propose a deep learning CNN approach to detect cloud pixels from medium-resolution satellite imagery. The proposed CNN accounts for both the low-level features, such as color and texture information as well as high-level features extracted from successive convolutions of the input image. We prepared a cloud-pixel dataset of approximately 7273 randomly sampled 320 by 320 pixels image patches taken from a total of 121 Landsat-8 (30m) and Sentinel-2 (20m) image scenes. These satellite images come with cloud masks. From the available data channels, only blue, green, red, and NIR bands are fed into the model. The CNN model was trained on 5300 image patches and validated on 1973 independent image patches. As the final output from our model, we extract a binary mask of cloud pixels and non-cloud pixels. The results are benchmarked against established cloud detection methods using standard accuracy metrics.

     
    more » « less