NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot

https://doi.org/10.1109/IROS60139.2025.11247695

Wang, Xiao; Dong, Lu; Rangasrinivasan, Sahana; Nwogu, Ifeoma; Setlur, Srirangaraj; Govindaraju, Venugopal (October 2025, IEEE)

Full Text Available
FUSE-MOS: Fusion of Speech Embeddings for MOS Prediction with Uncertainty Quantification

https://doi.org/10.21437/Interspeech.2025-2532

Hoq, Enjamamul; Gupta, Nikhil; Omondi, Danielle; Nwogu, Ifeoma (August 2025, ISCA)

Full Text Available
Exploring the Differences between Deaf and Hearing Infant Cries

https://doi.org/10.1109/ICASSP49660.2025.10888707

Hoq, Enjamamul; Nwogu, Ifeoma (April 2025, IEEE)

Full Text Available
Cross-Attention Based Influence Model for Manual and Nonmanual Sign Language Analysis

Chaudhary, Lipisha; Xu, Fei; Nwogu, Ifeoma (December 2024, Springer Nature)

Full Text Available
Towards Open Domain Text-Driven Synthesis of Multi-person Motions

https://doi.org/10.1007/978-3-031-73650-6_5

Shan, Mengyi; Dong, Lu; Han, Yutao; Yao, Yuan; Liu, Tao; Nwogu, Ifeoma; Qi, Guo-Jun; Hill, Mitch (November 2024, ECCV, Springer Nature Switzerland)

Full Text Available
Towards Open Domain Text-Driven Synthesis of Multi-person Motions

Shan, Mengyi; Dong, Lu; Han, Yutao; Yao, Yuan; Liu, Tao; Nwogu, Ifeoma; Qi, Guo_Jun; Hill, Mitch (October 2024, Springer_Science+Business_Media)

This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While singleperson text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pose information from large-scale image and video datasets. Our models use a transformer-based diffusion framework that accommodates multiple datasets with any number of subjects or frames. Experiments explore both generation of multi-person static poses and generation of multiperson motion sequences. To our knowledge, our method is the first to generate multi-subject motion sequences with high diversity and fidelity from a large variety of textual prompts.
more » « less
Full Text Available
Ig3D: Integrating 3D Face Representations in Facial Expression Inference

Dong, Lu; Wang, Xiao; Setlur, Srirangaraj; Govindaraju, Venu; Nwogu, Ifeoma (August 2024, Springer_Science+Business_Media)

Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks.
more » « less
Full Text Available
A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation

https://doi.org/10.1109/FG59268.2024.10582020

Xu, Fei; Chaudhary, Lipisha; Dong, Lu; Setlur, Srirangaraj; Govindaraju, Venu; Nwogu, Ifeoma (May 2024, IEEE)

Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: twostream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformerbased Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work.
more » « less
Full Text Available
SignAvatar: Sign Language 3D Motion Reconstruction and Generation

https://doi.org/10.1109/FG59268.2024.10581934

Dong, Lu; Chaudhary, Lipisha; Xu, Fei; Wang, Xiao; Lary, Mason; Nwogu, Ifeoma (May 2024, IEEE)

Achieving expressive 3D motion reconstruction and automatic generation for isolated sign words can be challenging, due to the lack of real-world 3D sign-word data, the complex nuances of signing motions, and the cross-modal understanding of sign language semantics. To address these challenges, we introduce SignAvatar, a framework capable of both word-level sign language reconstruction and generation. SignAvatar employs a transformer-based conditional variational autoencoder architecture, effectively establishing relationships across different semantic modalities. Additionally, this approach incorporates a curriculum learning strategy to enhance the model's robustness and generalization, resulting in more realistic motions. Furthermore, we contribute the ASL3DWord dataset, composed of 3D joint rotation data for the body, hands, and face, for unique sign words. We demonstrate the effectiveness of SignAvatar through extensive experiments, showcasing its superior reconstruction and automatic generation capabilities. The code and dataset are available on the project page
more » « less
Full Text Available
Dataset Infant Anonymization with Pose and Emotion Retention

https://doi.org/10.1109/FG59268.2024.10581938

Lary, Mason; Klawonn, Matthew; Messinger, Daniel; Nwogu, Ifeoma (May 2024, IEEE)

We demonstrate a procedure for the anonymization of infant subjects in videos such that salient behavioral information is retained. This method also creates a new identity that is consistent temporally across video frames. We present an overview of this anonymization process, which involves moving through the latent space of a generative model with an infant specific latent space traversal technique. We apply the technique on videos of infants, a historically difficult source of data, and make comparisons to other state-of-the-art anonymization systems. Metrics demonstrate an improved ability to retain emotional content of videos during the anonymization process, even during extreme emotions or poses, while maintaining a consistent identity throughout.
more » « less
Full Text Available

« Prev Next »

Search for: All records