skip to main content


Title: A Survey on Skeleton-Based Activity Recognition using Graph Convolutional Networks (GCN)
Skeleton-Based Activity recognition is an active research topic in Computer Vision. In recent years, deep learning methods have been used in this area, including Recurrent Neural Network (RNN)-based, Convolutional Neural Network (CNN)-based and Graph Convolutional Network (GCN)-based approaches. This paper provides a survey of recent work on various Graph Convolutional Network (GCN)-based approaches being applied to Skeleton-Based Activity Recognition. We first introduce the conventional implementation of a GCN. Then methods that address the limitations of conventional GCN's are presented.  more » « less
Award ID(s):
1831969
PAR ID:
10356217
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
The 12th Int’l Symposium on Image and Signal Processing and Analysis (ISPA)
Page Range / eLocation ID:
177 to 182
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The graph convolutional network (GCN) has recently achieved promising performance of 3D human pose estimation (HPE) by modeling the relationship among body parts. However, most prior GCN approaches suffer from two main drawbacks. First, they share a feature transformation for each node within a graph convolution layer. This prevents them from learning different relations between different body joints. Second, the graph is usually defined according to the human skeleton and is suboptimal because human activities often exhibit motion patterns beyond the natural connections of body joints. To address these limitations, we introduce a novel Modulated GCN for 3D HPE. It consists of two main components: weight modulation and affinity modulation. Weight modulation learns different modulation vectors for different nodes so that the feature transformations of different nodes are disentangled while retaining a small model size. Affinity modulation adjusts the graph structure in a GCN so that it can model additional edges beyond the human skeleton. We investigate several affinity modulation methods as well as the impact of regularizations. Rigorous ablation study indicates both types of modulation improve performance with negligible overhead. Compared with state-of-the-art GCNs for 3D HPE, our approach either significantly reduces the estimation errors, e.g., by around 10%, while retaining a small model size or drastically reduces the model size, e.g., from 4.22M to 0.29M (a 14.5× reduction), while achieving comparable performance. Results on two benchmarks show our Modulated GCN outperforms some recent states of the art. Our code is available at https://github.com/ZhimingZo/Modulated-GCN. 
    more » « less
  2. To improve computer-based recognition from video of isolated signs from American Sign Language (ASL), we propose a new skeleton-based method that involves explicit detection of the start and end frames of signs, trained on the ASLLVD dataset; it uses linguistically relevant parameters based on the skeleton input. Our method employs a bidirectional learning approach within a Graph Convolutional Network (GCN) framework. We apply this method to the WLASL dataset, but with corrections to the gloss labeling to ensure consistency in the labels assigned to different signs; it is important to have a 1-1 correspondence between signs and text-based gloss labels. We achieve a success rate of 77.43% for top-1 and 94.54% for top-5 using this modified WLASL dataset. Our method, which does not require multi-modal data input, outperforms other state-of-the-art approaches on the same modified WLASL dataset, demonstrating the importance of both attention to the start and end frames of signs and the use of bidirectional data streams in the GCNs for isolated sign recognition. 
    more » « less
  3. To improve computer-based recognition from video of isolated signs from American Sign Language (ASL), we propose a new skeleton-based method that involves explicit detection of the start and end frames of signs, trained on the ASLLVD dataset; it uses linguistically relevant parameters based on the skeleton input. Our method employs a bidirectional learning approach within a Graph Convolutional Network (GCN) framework. We apply this method to the WLASL dataset, but with corrections to the gloss labeling to ensure consistency in the labels assigned to different signs; it is important to have a 1-1 correspondence between signs and text-based gloss labels. We achieve a success rate of 77.43% for top-1 and 94.54% for top-5 using this modified WLASL dataset. Our method, which does not require multi-modal data input, outperforms other state-of-the-art approaches on the same modified WLASL dataset, demonstrating the importance of both attention to the start and end frames of signs and the use of bidirectional data streams in the GCNs for isolated sign recognition. 
    more » « less
  4. Graph convolutional neural network architectures combine feature extraction and convolutional layers for hyperspectral image classification. An adaptive neighborhood aggregation method based on statistical variance integrating the spatial information along with the spectral signature of the pixels is proposed for improving graph convolutional network classification of hyperspectral images. The spatial-spectral information is integrated into the adjacency matrix and processed by a single-layer graph convolutional network. The algorithm employs an adaptive neighborhood selection criteria conditioned by the class it belongs to. Compared to fixed window-based feature extraction, this method proves effective in capturing the spectral and spatial features with variable pixel neighborhood sizes. The experimental results from the Indian Pines, Houston University, and Botswana Hyperion hyperspectral image datasets show that the proposed AN-GCN can significantly improve classification accuracy. For example, the overall accuracy for Houston University data increases from 81.71% (MiniGCN) to 97.88% (AN-GCN). Furthermore, the AN-GCN can classify hyperspectral images of rice seeds exposed to high day and night temperatures, proving its efficacy in discriminating the seeds under increased ambient temperature treatments.

     
    more » « less
  5. Network embedding has been an effective tool to analyze heterogeneous networks (HNs) by representing nodes in a low-dimensional space. Although many recent methods have been proposed for representation learning of HNs, there is still much room for improvement. Random walks based methods are currently popular methods to learn network embedding; however, they are random and limited by the length of sampled walks, and have difculty capturing network structural information. Some recent researches proposed using meta paths to express the sample relationship in HNs. Another popular graph learning model, the graph convolutional network (GCN) is known to be capable of better exploitation of network topology, but the current design of GCN is intended for homogenous networks. This paper proposes a novel combination of meta-graph and graph convolution, the meta-graph based graph convolutional networks (MGCN). To fully capture the complex long semantic information, MGCN utilizes different meta-graphs in HNs. As different meta-graphs express different semantic relationships, MGCN learns the weights of different meta-graphs to make up for the loss of semantics when applying GCN. In addition, we improve the current convolution design by adding node self-signicance. To validate our model in learning feature representation, we present comprehensive experiments on four real-world datasets and two representation tasks: classication and link prediction. WMGCN's representations can improve accuracy scores by up to around 10% in comparison to other popular representation learning models. What's more, WMGCN'feature learning outperforms other popular baselines. The experimental results clearly show our model is superior over other state-of-the-art representation learning algorithms. 
    more » « less