skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The DOI auto-population feature in the Public Access Repository (PAR) will be unavailable from 4:00 PM ET on Tuesday, July 8 until 4:00 PM ET on Wednesday, July 9 due to scheduled maintenance. We apologize for the inconvenience caused.


Search for: All records

Creators/Authors contains: "Hasan, Zahid"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Liang, Xuefeng (Ed.)
    Deep learning has achieved state-of-the-art video action recognition (VAR) performance by comprehending action-related features from raw video. However, these models often learn to jointly encode auxiliary view (viewpoints and sensor properties) information with primary action features, leading to performance degradation under novel views and security concerns by revealing sensor types and locations. Here, we systematically study these shortcomings of VAR models and develop a novel approach, VIVAR, to learn view-invariant spatiotemporal action features removing view information. In particular, we leverage contrastive learning to separate actions and jointly optimize adversarial loss that aligns view distributions to remove auxiliary view information in the deep embedding space using the unlabeled synchronous multiview (MV) video to learn view-invariant VAR system. We evaluate VIVAR using our in-house large-scale time synchronous MV video dataset containing 10 actions with three angular viewpoints and sensors in diverse environments. VIVAR successfully captures view-invariant action features, improves inter and intra-action clusters’ quality, and outperforms SoTA models consistently with 8% more accuracy. We additionally perform extensive studies with our datasets, model architectures, multiple contrastive learning, and view distribution alignments to provide VIVAR insights. We open-source our code and dataset to facilitate further research in view-invariant systems. 
    more » « less
    Free, publicly-accessible full text available March 10, 2026
  2. We explore the effect of auxiliary labels in improving the classification accuracy of wearable sensor-based human activity recognition (HAR) systems, which are primarily trained with the supervision of the activity labels (e.g. running, walking, jumping). Supplemental meta-data are often available during the data collection process such as body positions of the wearable sensors, subjects' demographic information (e.g. gender, age), and the type of wearable used (e.g. smartphone, smart-watch). This information, while not directly related to the activity classification task, can nonetheless provide auxiliary supervision and has the potential to significantly improve the HAR accuracy by providing extra guidance on how to handle the introduced sample heterogeneity from the change in domains (i.e positions, persons, or sensors), especially in the presence of limited activity labels. However, integrating such meta-data information in the classification pipeline is non-trivial - (i) the complex interaction between the activity and domain label space is hard to capture with a simple multi-task and/or adversarial learning setup, (ii) meta-data and activity labels might not be simultaneously available for all collected samples. To address these issues, we propose a novel framework Conditional Domain Embeddings (CoDEm). From the available unlabeled raw samples and their domain meta-data, we first learn a set of domain embeddings using a contrastive learning methodology to handle inter-domain variability and inter-domain similarity. To classify the activities, CoDEm then learns the label embeddings in a contrastive fashion, conditioned on domain embeddings with a novel attention mechanism, enforcing the model to learn the complex domain-activity relationships. We extensively evaluate CoDEm in three benchmark datasets against a number of multi-task and adversarial learning baselines and achieve state-of-the-art performance in each avenue. 
    more » « less