skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ear Detection in the Wild Using Faster R-CNN Deep Learning
Ear recognition has its advantages in identifying non-cooperative individuals in unconstrained environments. Ear detection is a major step within the ear recognition algorithmic process. While conventional approaches for ear detection have been used in the past, Faster Region-based Convolutional Neural Network (Faster R-CNN) based detection methods have recently achieved superior detection performance in various benchmark studies, including those on face detection. In this work, we propose an ear detection system that uses Faster R-CNN. The training of the system is performed on two stages: First, an AlexNet model is trained for classifying ear vs. non-ear segments. Second, the unified Region Proposal Network (RPN) with the AlexNet, that shares the convolutional features, are trained for ear detection. The proposed system operates in real-time and accomplishes 98 % detection rate on a test set, composed of data coming from different ear datasets. In addition, the system's ear detection performance is high even when the test images are coming from un-controlled settings with a wide variety of images in terms of image quality, illumination and ear occlusion.  more » « less
Award ID(s):
1650474
PAR ID:
10091249
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Page Range / eLocation ID:
1124 to 1130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Agaian, Sos S.; DelMarco, Stephen P.; Asari, Vijayan K. (Ed.)
    Iris recognition is a widely used biometric technology that has high accuracy and reliability in well-controlled environments. However, the recognition accuracy can significantly degrade in non-ideal scenarios, such as off-angle iris images. To address these challenges, deep learning frameworks have been proposed to identify subjects through their off-angle iris images. Traditional CNN-based iris recognition systems train a single deep network using multiple off-angle iris image of the same subject to extract the gaze invariant features and test incoming off-angle images with this single network to classify it into same subject class. In another approach, multiple shallow networks are trained for each gaze angle that will be the experts for specific gaze angles. When testing an off-angle iris image, we first estimate the gaze angle and feed the probe image to its corresponding network for recognition. In this paper, we present an analysis of the performance of both single and multimodal deep learning frameworks to identify subjects through their off-angle iris images. Specifically, we compare the performance of a single AlexNet with multiple SqueezeNet models. SqueezeNet is a variation of the AlexNet that uses 50x fewer parameters and is optimized for devices with limited computational resources. Multi-model approach using multiple shallow networks, where each network is an expert for a specific gaze angle. Our experiments are conducted on an off-angle iris dataset consisting of 100 subjects captured at 10-degree intervals between -50 to +50 degrees. The results indicate that angles that are more distant from the trained angles have lower model accuracy than the angles that are closer to the trained gaze angle. Our findings suggest that the use of SqueezeNet, which requires fewer parameters than AlexNet, can enable iris recognition on devices with limited computational resources while maintaining accuracy. Overall, the results of this study can contribute to the development of more robust iris recognition systems that can perform well in non-ideal scenarios. 
    more » « less
  2. Iris biometric systems offer non-contact authentication, particularly advantageous in controlled environments such as security checkpoints. However, challenges arise in less controlled scenarios such as standoff biometrics where captured images mostly are non-ideal including off-angle. This paper addresses the need for iris recognition models adaptable to various gaze angles by proposing a blink detection algorithm as an additional feature. The study explores different blink detection methods including involving logistic regression, random forest, and deep learning models. For the first methodology, logistic regression and a random forest model were used to classify eye images into four different blink classes. The second methodology involved labeling eye openness percentage. The ground-truth eye blink was calculated using facial landmarks detected by the MediaPipe model. For the deep learning approach, we used a pre-trained Convolutional Neural Network (CNN) model by replacing the output layer with a regression layer. Results show improved precision and recall when incorporating height and width features for the regression model. The AlexNet model achieves superior performance, reaching 90% accuracy with a 10 % error threshold. This research contributes valuable insights for developing robust iris recognition models adaptable to diverse gaze angles. 
    more » « less
  3. This paper first proposes a method of formulating model interpretability in visual understanding tasks based on the idea of unfolding latent structures. It then presents a case study in object detection using popular two-stage region-based convolutional neural network (i.e., R-CNN) detection systems. The proposed method focuses on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. It utilizes a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of regions of interest (RoIs). It presents an AOGParsing operator that seamlessly integrates with the RoIPooling /RoIAlign operator widely used in R-CNN and is trained end-to-end. In object detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the qualitatively extractive rationale generated for interpreting detection. In experiments, Faster R-CNN is used to test the proposed method on the PASCAL VOC 2007 and the COCO 2017 object detection datasets. The experimental results show that the proposed method can compute promising latent structures without hurting the performance. 
    more » « less
  4. In the past few years, automatic building detection in aerial images has become an emerging field in computer vision. Detecting the specific types of houses will provide information in urbanization, change detection, and urban monitoring that play increasingly important roles in modern city planning and natural hazard preparedness. In this paper, we demonstrate the effectiveness of detecting various types of houses in aerial imagery using Faster Region-based Convolutional Neural Network (Faster-RCNN). After formulating the dataset and extracting bounding-box information, pre-trained ResNet50 is used to get the feature maps. The fully convolutional Region Proposal Network (RPN) first predicts the bounds and objectness score of objects (in this case house) from the feature maps. Then, the Region of Interest (RoI) pooling layer extracts interested regions to detect objects that are present in the images. To the best of our knowledge, this is the first attempt at detecting houses using Faster R-CNN that has achieved satisfactory results. This experiment opens a new path to conduct and extent the works not only for civil and environmental domain but also other applied science disciplines. 
    more » « less
  5. Abstract Giant star-forming clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z ≳ 1) galaxies but their formation and role in galaxy evolution remain unclear. Observations of low-redshift clumpy galaxy analogues are rare but the availability of wide-field galaxy survey data makes the detection of large clumpy galaxy samples much more feasible. Deep Learning (DL), and in particular Convolutional Neural Networks (CNNs), have been successfully applied to image classification tasks in astrophysical data analysis. However, one application of DL that remains relatively unexplored is that of automatically identifying and localizing specific objects or features in astrophysical imaging data. In this paper, we demonstrate the use of DL-based object detection models to localize GSFCs in astrophysical imaging data. We apply the Faster Region-based Convolutional Neural Network object detection framework (FRCNN) to identify GSFCs in low-redshift (z ≲ 0.3) galaxies. Unlike other studies, we train different FRCNN models on observational data that was collected by the Sloan Digital Sky Survey and labelled by volunteers from the citizen science project ‘Galaxy Zoo: Clump Scout’. The FRCNN model relies on a CNN component as a ‘backbone’ feature extractor. We show that CNNs, that have been pre-trained for image classification using astrophysical images, outperform those that have been pre-trained on terrestrial images. In particular, we compare a domain-specific CNN – ‘Zoobot’ – with a generic classification backbone and find that Zoobot achieves higher detection performance. Our final model is capable of producing GSFC detections with a completeness and purity of ≥0.8 while only being trained on ∼5000 galaxy images. 
    more » « less