skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Contextual Attention for Hand Detection in the Wild
We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. Hand-CNN extends MaskRCNN with a novel attention mechanism to incorporate contextual cues in the detection process. This attention mechanism can be implemented as an efficient network module that captures non-local dependencies between features. This network module can be inserted at different stages of an object detection network, and the entire detector can be trained end-to-end. We also introduce a large-scale annotated hand dataset containing hands in unconstrained images for training and evaluation. We show that Hand-CNN outperforms existing methods on several datasets, including our hand detection benchmark and the publicly available PASCAL VOC human layout challenge. We also conduct ablation studies on hand detection to show the effectiveness of the proposed contextual attention module.  more » « less
Award ID(s):
1650499
PAR ID:
10137864
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE International Conference on Computer Vision workshops
ISSN:
2473-9936
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Structural accuracy of segmentation is important for fine-scale structures in biomedical images. We propose a novel Topological-Attention ConvLSTM Network (TACLNet) for 3D anisotropic image segmentation with high structural accuracy. We adopt ConvLSTM to leverage contextual information from adjacent slices while achieving high efficiency. We propose a Spatial Topological-Attention (STA) module to effectively transfer topologically critical information across slices. Furthermore, we propose an Iterative Topological-Attention (ITA) module that provides a more stable topologically critical map for segmentation. Quantitative and qualitative results show that our proposed method outperforms various baselines in terms of topology-aware evaluation metrics. 
    more » « less
  2. Robust Mask R-CNN (Mask Regional Convolutional Neural Network) methods are proposed and tested for automatic detection of cracks on structures or their components that may be damaged during extreme events, such as earthquakes. We curated a new dataset with 2,021 labeled images for training and validation and aimed to find end-to-end deep neural networks for crack detection in the field. With data augmentation and parameters fine-tuning, Path Aggregation Network (PANet) with spatial attention mechanisms and High- resolution Network (HRNet) are introduced into Mask R-CNNs. The tests on three public datasets with low- or high-resolution images demonstrate that the proposed methods can achieve a big improvement over alternative networks, so the proposed method may be sufficient for crack detection for a variety of scales in real applications. 
    more » « less
  3. Detecting small objects (e.g., manhole covers, license plates, and roadside milestones) in urban images is a long-standing challenge mainly due to the scale of small object and background clutter. Although convolution neural network (CNN)-based methods have made significant progress and achieved impressive results in generic object detection, the problem of small object detection remains unsolved. To address this challenge, in this study we developed an end-to-end network architecture that has three significant characteristics compared to previous works. First, we designed a backbone network module, namely Reduced Downsampling Network (RD-Net), to extract informative feature representations with high spatial resolutions and preserve local information for small objects. Second, we introduced an Adjustable Sample Selection (ADSS) module which frees the Intersection-over-Union (IoU) threshold hyperparameters and defines positive and negative training samples based on statistical characteristics between generated anchors and ground reference bounding boxes. Third, we incorporated the generalized Intersection-over-Union (GIoU) loss for bounding box regression, which efficiently bridges the gap between distance-based optimization loss and area-based evaluation metrics. We demonstrated the effectiveness of our method by performing extensive experiments on the public Urban Element Detection (UED) dataset acquired by Mobile Mapping Systems (MMS). The Average Precision (AP) of the proposed method was 81.71%, representing an improvement of 1.2% compared with the popular detection framework Faster R-CNN. 
    more » « less
  4. Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, age, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-branch neural network that leverages SB attribute information to boost the performance of FR. To this ed, we propose a cross-attribute-guided transformer fusion (CATF) module that effectively captures the long-range dependencies and relationships between FR and SB feature representations. The synergy created by the reciprocal flow of information in the dual cross-attention operations of the proposed CATF module enhances the performance of FR. Furthermore, we introduce a novel self-attention distillation framework that effectively highlights crucial facial regions, such as landmarks by aligning low-quality images with those of their high-quality counterparts in the feature space. The proposed self-attention distillation regularizes our network. to learn a unified quality-invariant feature representation in unconstrained environments. We conduct extensive experiments on various real-world FR benchmarks varying in quality. Experimental results demonstrate the superiority of our FR method compared to state-of-the-art FR studies. 
    more » « less
  5. null (Ed.)
    Accurate segmentation and parameterization of the iris in eye images still remain a significant challenge for achieving robust iris recognition, especially in off‐angle images captured in less constrained environments. While deep learning techniques (i.e. segmentation‐based convolutional neural networks (CNNs)) are increasingly being used to address this problem, there is a significant lack of information about the mechanism of the related distortions affecting the performance of these networks and no comprehensive recognition framework is dedicated, in particular, to off‐angle iris recognition using such modules. In this work, the general effect of different gaze angles on ocular biometrics is discussed, and the findings are then related to the CNN‐based off‐angle iris segmentation results and the subsequent recognition performance. An improvement scheme is also introduced to compensate for some segmentation degradations caused by the off‐angle distortions, and a new gaze‐angle estimation and parameterization module is further proposed to estimate and re‐project (correct) the offangle iris images back to frontal view. Taking benefit of these, several approaches (pipelines) are formulated to configure an end‐to‐end framework for the CNN‐based offangle iris segmentation and recognition. Within the framework of these approaches, a series of experiments is carried out to determine whether (i) improving the segmentation outputs and/or correcting the output iris images before or after the segmentation can compensate for some off‐angle distortions, (ii) a CNN trained on frontal eye images is capable of detecting and extracting the learnt features on the corrected images, or (iii) the generalisation capability of the network can be improved by training it on iris images of different gaze angles. Finally, the recognition performance of the selected approach is compared against some state‐of‐the‐art off‐angle iris recognition algorithms. 
    more » « less