skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: "Towards Real-time House Detection in Aerial Imagery Using Faster Region-based Convolutional Neural Network"
In the past few years, automatic building detection in aerial images has become an emerging field in computer vision. Detecting the specific types of houses will provide information in urbanization, change detection, and urban monitoring that play increasingly important roles in modern city planning and natural hazard preparedness. In this paper, we demonstrate the effectiveness of detecting various types of houses in aerial imagery using Faster Region-based Convolutional Neural Network (Faster-RCNN). After formulating the dataset and extracting bounding-box information, pre-trained ResNet50 is used to get the feature maps. The fully convolutional Region Proposal Network (RPN) first predicts the bounds and objectness score of objects (in this case house) from the feature maps. Then, the Region of Interest (RoI) pooling layer extracts interested regions to detect objects that are present in the images. To the best of our knowledge, this is the first attempt at detecting houses using Faster R-CNN that has achieved satisfactory results. This experiment opens a new path to conduct and extent the works not only for civil and environmental domain but also other applied science disciplines.  more » « less
Award ID(s):
2018611
PAR ID:
10517985
Author(s) / Creator(s):
; ; ; ; ; ;
Corporate Creator(s):
; ; ; ; ; ;
Publisher / Repository:
IPSI, Belgrade
Date Published:
Journal Name:
IPSI Transactions on Internet Research
Volume:
19
Issue:
02
ISSN:
1820-4503
Page Range / eLocation ID:
46 to 54
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The presence of fog in the background can prevent small and distant objects from being detected, let alone tracked. Under safety-critical conditions, multi-object tracking models require faster tracking speed while maintaining high object-tracking accuracy. The original DeepSORT algorithm used YOLOv4 for the detection phase and a simple neural network for the deep appearance descriptor. Consequently, the feature map generated loses relevant details about the track being matched with a given detection in fog. Targets with a high degree of appearance similarity on the detection frame are more likely to be mismatched, resulting in identity switches or track failures in heavy fog. We propose an improved multi-object tracking model based on the DeepSORT algorithm to improve tracking accuracy and speed under foggy weather conditions. First, we employed our camera-radar fusion network (CR-YOLOnet) in the detection phase for faster and more accurate object detection. We proposed an appearance feature network to replace the basic convolutional neural network. We incorporated GhostNet to take the place of the traditional convolutional layers to generate more features and reduce computational complexities and costs. We adopted a segmentation module and fed the semantic labels of the corresponding input frame to add rich semantic information to the low-level appearance feature maps. Our proposed method outperformed YOLOv5 + DeepSORT with a 35.15% increase in multi-object tracking accuracy, a 32.65% increase in multi-object tracking precision, a speed increase by 37.56%, and identity switches decreased by 46.81%. 
    more » « less
  2. Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues inspired by observing that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object clustering and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces object cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region and their features are fed into DetecNet for object detection. ClusDet has several advantages over previous solutions: (1) it greatly reduces the number of chips for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three popular aerial image datasets including VisDrone, UAVDT and DOTA. In all experiments, ClusDet achieves promising performance in comparison with state-of-the-art detectors 
    more » « less
  3. Ear recognition has its advantages in identifying non-cooperative individuals in unconstrained environments. Ear detection is a major step within the ear recognition algorithmic process. While conventional approaches for ear detection have been used in the past, Faster Region-based Convolutional Neural Network (Faster R-CNN) based detection methods have recently achieved superior detection performance in various benchmark studies, including those on face detection. In this work, we propose an ear detection system that uses Faster R-CNN. The training of the system is performed on two stages: First, an AlexNet model is trained for classifying ear vs. non-ear segments. Second, the unified Region Proposal Network (RPN) with the AlexNet, that shares the convolutional features, are trained for ear detection. The proposed system operates in real-time and accomplishes 98 % detection rate on a test set, composed of data coming from different ear datasets. In addition, the system's ear detection performance is high even when the test images are coming from un-controlled settings with a wide variety of images in terms of image quality, illumination and ear occlusion. 
    more » « less
  4. Object detection in high-resolution aerial images is a challenging task because of 1) the large variation in object size, and 2) non-uniform distribution of objects. A common solution is to divide the large aerial image into small (uniform) crops and then apply object detection on each small crop. In this paper, we investigate the image cropping strategy to address these challenges. Specifically, we propose a Density-Map guided object detection Network (DMNet), which is inspired from the observation that the object density map of an image presents how objects distribute in terms of the pixel intensity of the map. As pixel intensity varies, it is able to tell whether a region has objects or not, which in turn provides guidance for cropping images statistically. DMNet has three key components: a density map generation module, an image cropping module and an object detector. DMNet generates a density map and learns scale information based on density intensities to form cropping regions. Extensive experiments show that DMNet achieves state-of-the-art performance on two popular aerial image datasets, i.e. VisionDrone and UAVDT. 
    more » « less
  5. Detecting small objects (e.g., manhole covers, license plates, and roadside milestones) in urban images is a long-standing challenge mainly due to the scale of small object and background clutter. Although convolution neural network (CNN)-based methods have made significant progress and achieved impressive results in generic object detection, the problem of small object detection remains unsolved. To address this challenge, in this study we developed an end-to-end network architecture that has three significant characteristics compared to previous works. First, we designed a backbone network module, namely Reduced Downsampling Network (RD-Net), to extract informative feature representations with high spatial resolutions and preserve local information for small objects. Second, we introduced an Adjustable Sample Selection (ADSS) module which frees the Intersection-over-Union (IoU) threshold hyperparameters and defines positive and negative training samples based on statistical characteristics between generated anchors and ground reference bounding boxes. Third, we incorporated the generalized Intersection-over-Union (GIoU) loss for bounding box regression, which efficiently bridges the gap between distance-based optimization loss and area-based evaluation metrics. We demonstrated the effectiveness of our method by performing extensive experiments on the public Urban Element Detection (UED) dataset acquired by Mobile Mapping Systems (MMS). The Average Precision (AP) of the proposed method was 81.71%, representing an improvement of 1.2% compared with the popular detection framework Faster R-CNN. 
    more » « less