skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards a Geometric Understanding of Spatiotemporal Graph Convolution Networks
Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for skeleton -based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literature, to the best of our knowledge, there exists no study on the layerwise explainability of the embeddings learned on spatiotemporal data using STGCNs. In this paper, we first propose to use a local Dataset Graph (DS-Graph) obtained from the feature representation of input data at each layer to develop an understanding of the layer-wise embedding geometry of the STGCN. To do so, we develop a window-based dynamic time warping (DTW) method to compute the distance between data sequences with varying temporal lengths. To validate our findings, we have developed a layer-specific Spatiotemporal Graph Gradient-weighted Class Activation Mapping (L-STG-GradCAM) technique tailored for spatiotemporal data. This approach enables us to visually analyze and interpret each layer within the STGCN network. We characterize the functions learned by each layer of the STGCN using the label smoothness of the representation and visualize them using our L-STG-GradCAM approach. Our proposed method is generic and can yield valuable insights for STGCN architectures in different applications. However, this paper focuses on the human activity recognition task as a representative application. Our experiments show that STGCN models learn representations that capture general human motion in their initial layers while discriminating different actions only in later layers. This justifies experimental observations showing that fine-tuning deeper layers works well for transfer between related tasks. We provide experimental evidence for different human activity datasets and advanced spatiotemporal graph networks to validate that the proposed method is general enough to analyze any STGCN model and can be useful for drawing insight into networks in various scenarios. We also show that noise at the input has a limited effect on label smoothness, which can help justify the robustness of STGCNs to noise.  more » « less
Award ID(s):
2009032
PAR ID:
10555788
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Open Journal of Signal Processing
Volume:
5
ISSN:
2644-1322
Page Range / eLocation ID:
1023 to 1030
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep neural networks are able to memorize noisy labels easily with a softmax cross entropy (CE) loss. Previous studies attempted to address this issue focus on incorporating a noise-robust loss function to the CE loss. However, the memorization issue is alleviated but still remains due to the non-robust CE loss. To address this issue, we focus on learning robust contrastive representations of data on which the classifier is hard to memorize the label noise under the CE loss. We propose a novel contrastive regularization function to learn such representations over noisy data where the label noise does not dominate the representation learning. By theoretically investigating the representations induced by the proposed regularization function, we reveal that the learned representations keep information related to true labels and discard information related to corrupted labels from images. Moreover, our theoretical results also indicate that the learned representations are robust to the label noise. Experiments on benchmark datasets demonstrate that the efficacy of our method. 
    more » « less
  2. null (Ed.)
    Multiplex networks are complex graph structures in which a set of entities are connected to each other via multiple types of relations, each relation representing a distinct layer. Such graphs are used to investigate many complex biological, social, and technological systems. In this work, we present a novel semi-supervised approach for structure-aware representation learning on multiplex networks. Our approach relies on maximizing the mutual information between local node-wise patch representations and label correlated structure-aware global graph representations to model the nodes and cluster structures jointly. Specifically, it leverages a novel cluster-aware, node-contextualized global graph summary generation strategy for effective joint-modeling of node and cluster representations across the layers of a multiplex network. Empirically, we demonstrate that the proposed architecture outperforms state-of-the-art methods in a range of tasks: classification, clustering, visualization, and similarity search on seven real-world multiplex networks for various experiment settings. 
    more » « less
  3. null (Ed.)
    The key role of emotions in human life is undeniable. The question of whether there exists a brain pattern associated with a specific emotion is the theme of many affective neuroscience studies. In this work, we bring to bear graph signal processing (GSP) techniques to tackle the problem of automatic emotion recognition using brain signals. GSP is an extension of classical signal processing methods to complex networks where there exists an inherent relation graph. With the help of GSP, we propose a new framework for learning class-specific discriminative graphs. To that end, firstly we assume for each class of observations there exists a latent underlying graph representation. Secondly, we consider the observations are smooth on their corresponding class-specific sough graph while they are non-smooth on other classes’ graphs. The learned class-specific graph-based representations can act as sub-dictionaries and be utilized for the task of emotion classification. Applying the proposed method on an electroencephalogram (EEG) emotion recognition dataset indicates the superiority of our framework over other state-of-the-art methods. 
    more » « less
  4. In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition. Its parameters for iterative flow optimization are learned in an end-to-end fashion together with the other CNN model parameters, maximizing the action recognition performance. Furthermore, we newly introduce the concept of learning `flow of flow' representations by stacking multiple representation flow layers. We conducted extensive experimental evaluations, confirming its advantages over previous recognition models using traditional optical flows in both computational speed and performance. The code is publicly available. 
    more » « less
  5. Multi-label image recognition has been an indispensable fundamental component for many real computer vision applications. However, a severe threat of privacy leakage in multi-label image recognition has been overlooked by existing studies. To fill this gap, two privacy-preserving models, Privacy-Preserving Multi-label Graph Convolutional Networks (P2-ML-GCN) and Robust P2-ML-GCN (RP2-ML-GCN), are developed in this article, where differential privacy mechanism is implemented on the model’s outputs so as to defend black-box attack and avoid large aggregated noise simultaneously. In particular, a regularization term is exploited in the loss function of RP2-ML-GCN to increase the model prediction accuracy and robustness. After that, a proper differential privacy mechanism is designed with the intention of decreasing the bias of loss function in P2-ML-GCN and increasing prediction accuracy. Besides, we analyze that a bounded global sensitivity can mitigate excessive noise’s side effect and obtain a performance improvement for multi-label image recognition in our models. Theoretical proof shows that our two models can guarantee differential privacy for model’s outputs, weights and input features while preserving model robustness. Finally, comprehensive experiments are conducted to validate the advantages of our proposed models, including the implementation of differential privacy on model’s outputs, the incorporation of regularization term into loss function, and the adoption of bounded global sensitivity for multi-label image recognition. 
    more » « less