skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1840080

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Transformer-based human skeleton action recognition has been developed for years. However, the complexity and high parameter count demands of these models hinder their practical applications, especially in resource-constrained environments. In this work, we propose FreqMixForemrV2, which was built upon the Frequency-aware Mixed Transformer (FreqMixFormer) for identifying subtle and discriminative actions with pioneered frequency-domain analysis. We design a lightweight architecture that maintains robust performance while significantly reducing the model complexity. This is achieved through a redesigned frequency operator that optimizes high-frequency and low-frequency parameter adjustments, and a simplified frequency-aware attention module. These improvements result in a substantial reduction in model parameters, enabling efficient deployment with only a minimal sacrifice in accuracy. Comprehensive evaluations of standard datasets (NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets) demonstrate that the proposed model achieves a superior balance between efficiency and accuracy, outperforming state-of-the-art methods with only 60% of the parameters. 
    more » « less
    Free, publicly-accessible full text available January 30, 2026
  2. Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that exhibit similar motion patterns. To address this challenge, we introduce the Frequency-aware Mixed Transformer (FreqMixFormer), specifically designed for recognizing similar skeletal actions with subtle discriminative motions. First, we introduce a frequency-aware attention module to unweave skeleton frequency representations by embedding joint features into frequency attention maps, aiming to distinguish the discriminative movements based on their frequency coefficients. Subsequently, we develop a mixed transformer architecture to incorporate spatial features with frequency features to model the comprehensive frequency-spatial patterns. Additionally, a temporal transformer is proposed to extract the global correlations across frames. Extensive experiments show that FreqMiXFormer outperforms SOTA on 3 popular skeleton action recognition datasets, including NTU RGB+D, NTU RGB+D120, and NW-UCLA datasets. Our project is publicly available at: https://github.com/wenhanwu95/FreqMixFormer. 
    more » « less
  3. As the metaverse grows with the advances of new technologies, a number of researchers have raised the concern on the privacy of motion data in virtual reality (VR). It is becoming clear that motion data can reveal essential information of people, such as user identification. However, the fundamental problems about what types of motion data, how to process, and on what ranges of VR applications are still underexplored. This work summarizes the work of motion data privacy on these aspects from both the fields of VR and data privacy. Our results demonstrate that researchers from both fields have recognized the importance of the problem, while there are differences due to the focused problems. A variety of VR studies have been used for user identification, and the results are affected by the application types and ranges of involved actions. We also review the biometrics work from related fields including the behaviors of keystrokes and waist as well as data of skeleton, face and fingerprint. At the end, we discuss our findings and suggest future work to protect the privacy of motion data. 
    more » « less
  4. In recent years, remarkable results have been achieved in self-supervised action recognition using skeleton sequences with contrastive learning. It has been observed that the semantic distinction of human action features is often represented by local body parts, such as legs or hands, which are advantageous for skeleton-based action recognition. This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations. To achieve this, a multi-head attention mask module is employed to learn the soft attention mask features from the skeletons, suppressing non-salient local features while accentuating local salient features, thereby bringing similar local features closer in the feature space. Additionally, ample contrastive pairs are generated by expanding contrastive pairs based on salient and non-salient features with global features, which guide the network to learn the semantic representations of the entire skeleton. Therefore, with the attention mask mechanism, SkeAttnCLR learns local features under different data augmentation views. The experiment results demonstrate that the inclusion of local feature similarity significantly enhances skeleton-based action representation. Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets. The code and settings are available at this repository: https://github.com/GitHubOfHyl97/SkeAttnCLR. 
    more » « less
  5. Self-supervised skeleton-based action recognition has attracted more attention in recent years. By utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem and reduce the demand for massive labeled training data. Inspired by the MAE [1], we propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition (SkeletonMAE). Following MAE's masking and reconstruction pipeline, we utilize a skeleton-based encoder-decoder transformer architecture to reconstruct the masked skeleton sequences. A novel masking strategy, named Spatial-Temporal Masking, is introduced in terms of both joint-level and frame-level for the skeleton sequence. This pre-training strategy makes the encoder output generalizable skeleton features with spatial and temporal dependencies. Given the unmasked skeleton sequence, the encoder is fine-tuned for the action recognition task. Extensive ex- periments show that our SkeletonMAE achieves remarkable performance and outperforms the state-of-the-art methods on both NTU RGB+D 60 and NTU RGB+D 120 datasets. 
    more » « less
  6. With the emergence and fast development of cloud computing and outsourced services, more and more companies start to use managed security service providers (MSSP) as their security service team. This approach can save the budget on maintaining its own security teams and depend on professional security persons to protect the company infrastructures and intellectual property. However, this approach also gives the MSSP opportunities to honor only a part of the security service level agreement. To pre- vent this from happening, researchers propose to use outsourced network testing to verify the execution of the security policies. During this procedure, the end customer has to design network testing traffic and provide it to the testers. Since the testing traffic is designed based on the security rules and selectors, external testers could derive the customer network security setup, and conduct subsequent attacks based on the learned knowledge. To protect the network security configuration secrecy in outsourced testing, in this paper we propose different methods to hide the accurate information. For Regex-based security selectors, we propose to introduce fake testing traffic to confuse the testers. For exact match and range based selectors, we propose to use NAT VM to hide the accurate information. We conduct simulation to show the protection effectiveness under different scenarios. We also discuss the advantages of our approaches and the potential challenges. 
    more » « less
  7. Skeleton-based motion capture and visualization is an important computer vision task, especially in the virtual reality (VR) envi- ronment. It has grown increasingly popular due to the ease of gathering skeleton data and the high demand of virtual socializa- tion. The captured skeleton data seems anonymous but can still be used to extract personal identifiable information (PII). This can lead to an unintended privacy leakage inside a VR meta-verse. We propose a novel linkage attack on skeleton-based motion visual- ization. It detects if a target and a reference skeleton are the same individual. The proposed model, called Linkage Attack Neural Net- work (LAN), is based on the principles of a Siamese Network. It incorporates deep neural networks to embed the relevant PII then uses a classifier to match the reference and target skeletons. We also employ classical and deep motion retargeting (MR) to cast the target skeleton onto a dummy skeleton such that the motion sequence is anonymized for privacy protection. Our evaluation shows that the effectiveness of LAN in the linkage attack and the effectiveness of MR in anonymization. The source code is available at https://github.com/Thomasc33/Linkage-Attack 
    more » « less