skip to main content


Title: Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery
Human action recognition is an important topic in artificial intelligence with a wide range of applications including surveillance systems, search-and-rescue operations, human-computer interaction, etc. However, most of the current action recognition systems utilize videos captured by stationary cameras. Another emerging technology is the use of unmanned ground and aerial vehicles (UAV/UGV) for different tasks such as transportation, traffic control, border patrolling, wild-life monitoring, etc. This technology has become more popular in recent years due to its affordability, high maneuverability, and limited human interventions. However, there does not exist an efficient action recognition algorithm for UAV-based monitoring platforms. This paper considers UAV-based video action recognition by addressing the key issues of aerial imaging systems such as camera motion and vibration, low resolution, and tiny human size. In particular, we propose an automated deep learning-based action recognition system which includes the three stages of video stabilization using the SURF feature selection and Lucas-Kanade method, human action area detection using faster region-based convolutional neural networks (R-CNN), and action recognition. We propose a novel structure that extends and modifies the InceptionResNet-v2 architecture by combining a 3D CNN architecture and a residual network for action recognition. We achieve an average accuracy of 85.83% for the entire-video-level recognition when applying our algorithm to the popular UCF-ARG aerial imaging dataset. This accuracy significantly improves upon the state-of-the-art accuracy by a margin of 17%.  more » « less
Award ID(s):
1755984
PAR ID:
10216527
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International Symposium on Visual Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. IEEE (Ed.)
    Over past few years, unmanned aircraft vehicles (UAVs) have been becoming more and more popular for various purposes such as surveillance, automated industry, robotics, vehicle guidance, traffic monitoring and control system. It is very important to have multiple methods of UAVs controlling to fit in UAVs usages. The goal of this work was to develop a new technique to control an UAV by using different hand gestures. To achieve this, a hand keypoint detection algorithm was used to detect 21 keypoints in the hand. Then this keypoints were used as the input to an intelligent system based on Convolutional Neural Networks (CNN) that was able to classify the hand gestures. To capture the hand gestures, the video camera of the UAV was used. A database containing 2400 hand images was created and used to train the CNN. The database contained 8 different hand gestures that were selected to send specific motion commands to the UAV. The accuracy of the CNN to classify the hand gestures was 93%. To test the capabilities of our intelligent control system, a small UAV, the DJI Ryze Tello drone, was used. The experimental results demonstrated that the DJI Tello drone was able to be successfully controlled by hand gestures in real time. 
    more » « less
  2. Long-range target detection in thermal infrared imagery is a challenging research problem due to the low resolution and limited detail captured by thermal sensors. The limited size and variability in thermal image datasets for small target detection is also a major constraint for the development of accurate and robust detection algorithms. To address both the sensor and data constraints, we propose a novel convolutional neural network (CNN) feature extraction architecture designed for small object detection in data-limited settings. More specifically, we focus on long-range ground-based thermal vehicle detection, but also show the effectiveness of the proposed algorithm on drone and satellite aerial imagery. The design of the proposed architecture is inspired by an analysis of popular object detectors as well as custom-designed networks. We find that restricted receptive fields (rather than more globalized features, as is the trend), along with less down sampling of feature maps and attenuated processing of fine-grained features, lead to greatly improved detection rates while mitigating the model’s capacity to overfit on small or poorly varied datasets. Our approach achieves state-of-the-art results on the Defense Systems Information Analysis Center (DSIAC) automated target recognition (ATR) and the Tiny Object Detection in Aerial Images (AI-TOD) datasets. 
    more » « less
  3. Human-Robot Collaboration (HRC), which enables a workspace where human and robot can dynamically and safely collaborate for improved operational efficiency, has been identified as a key element in smart manu­facturing. Human action recognition plays a key role in the realization ofHRC, as it helps identify current human action and provides the basis for future action prediction and robot planning. While Deep Learning (DL) has demonstrated great potential in advancing human action recognition, effectively leveraging the temporal in­formation of human motions to improve the accuracy and robustness of action recognition has remained as a challenge. Furthermore, it is often difficult to obtain a large volume of data for DL network training and opti­mization, due to operational constraints in a realistic manufacturing setting. This paper presents an integrated method to address these two challenges, based on the optical flow and convolutional neural network (CNN)­based transfer learning. Specifically, optical flow images, which encode the temporal information of human motion, are extracted and serve as the input to a two-stream CNN structure for simultaneous parsing of spatial­-temporal information of human motion. Subsequently, transfer learning is investigated to transfer the feature extraction capability of a pre-trained CNN to manufacturing scenarios. Evaluation using engine block assembly confirmed the effectiveness of the developed method. 
    more » « less
  4. Measuring the inclination angles of utility poles of the electric power distribution lines is critical to maintain power distribution systems and minimize power outages, because the poles are very vulnerable to natural disasters. However, traditional human-based pole inspection methods are very costly and require heavy workloads. In this paper, we propose a novel pole monitoring system to measure the inclination angle of utility poles from images captured by unmanned aerial vehicle (UAV) automatically. A state-of-the-art deep learning neural network is used to detect and segment utility poles from UAV street view images, and computer vision techniques are used to calculate the inclination angles based on the segmented poles. The proposed method was evaluated using 64 images with 84 utility poles taken in different weather conditions. The pole segmentation accuracy is 93.74% and the average inclination angle error is 0.59 degrees, which demonstrate the efficiency of the proposed utility pole monitoring system. 
    more » « less
  5. State-of-the-art deep learning technology has been successfully applied to relatively small selected areas of very high spatial resolution (0.15 and 0.25 m) optical aerial imagery acquired by a fixed-wing aircraft to automatically characterize ice-wedge polygons (IWPs) in the Arctic tundra. However, any mapping of IWPs at regional to continental scales requires images acquired on different sensor platforms (particularly satellite) and a refined understanding of the performance stability of the method across sensor platforms through reliable evaluation assessments. In this study, we examined the transferability of a deep learning Mask Region-Based Convolutional Neural Network (R-CNN) model for mapping IWPs in satellite remote sensing imagery (~0.5 m) covering 272 km2 and unmanned aerial vehicle (UAV) (0.02 m) imagery covering 0.32 km2. Multi-spectral images were obtained from the WorldView-2 satellite sensor and pan-sharpened to ~0.5 m, and a 20 mp CMOS sensor camera onboard a UAV, respectively. The training dataset included 25,489 and 6022 manually delineated IWPs from satellite and fixed-wing aircraft aerial imagery near the Arctic Coastal Plain, northern Alaska. Quantitative assessments showed that individual IWPs were correctly detected at up to 72% and 70%, and delineated at up to 73% and 68% F1 score accuracy levels for satellite and UAV images, respectively. Expert-based qualitative assessments showed that IWPs were correctly detected at good (40–60%) and excellent (80–100%) accuracy levels for satellite and UAV images, respectively, and delineated at excellent (80–100%) level for both images. We found that (1) regardless of spatial resolution and spectral bands, the deep learning Mask R-CNN model effectively mapped IWPs in both remote sensing satellite and UAV images; (2) the model achieved a better accuracy in detection with finer image resolution, such as UAV imagery, yet a better accuracy in delineation with coarser image resolution, such as satellite imagery; (3) increasing the number of training data with different resolutions between the training and actual application imagery does not necessarily result in better performance of the Mask R-CNN in IWPs mapping; (4) and overall, the model underestimates the total number of IWPs particularly in terms of disjoint/incomplete IWPs. 
    more » « less