NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

From Scribbles to Text: A Novel Transformer-Based Recognition Model for Child Handwriting

https://doi.org/10.1007/978-3-032-04614-7_7

Rangasrinivasan, Sahana; M_S, Sumi Suresh; Setlur, Srirangaraj; Jayaraman, Bharat; Govindaraju, Venu (September 2025, Springer Nature Switzerland)

Free, publicly-accessible full text available September 13, 2026
AI-Enhanced Child Handwriting Analysis: A Framework for the Early Screening of Dyslexia and Dysgraphia

https://doi.org/10.1007/s42979-025-03927-0

Rangasrinivasan, Sahana; Sumi_Suresh, M_S; Olszewski, Abbie; Setlur, Srirangaraj; Jayaraman, Bharat; Govindaraju, Venu (April 2025, SN Computer Science)

Abstract Dyslexia and dysgraphia are two specific learning disabilities (SLDs) that are prevalent among children. To minimize the negative impact these SLDs have on a child’s academic and social-emotional development, it is crucial to identify dyslexia and dysgraphia at an early age, enabling timely and effective intervention. The first step in this process is screening, which helps determine if a child requires further instruction or a more in-depth assessment. Current screening tools are expensive, require additional administration time beyond regular classroom activities, and are designed to screen exclusively for one condition, not for both dyslexia and dysgraphia, which often share some common behavioral characteristics. Most dyslexia screeners focus on speech and oral tasks and exclude writing activities. However, analyzing children’s writing samples for behavioral signs of dyslexia and dysgraphia can offer valuable insights into the screening process, which can be time-consuming. As a solution, we propose a co-designed framework for building artificial intelligence (AI) tools that could boost the efficiency of screening and aid practitioners such as speech-language pathologists (SLPs), occupational therapists, general educators, and special educators by simplifying their tasks. This paper reviews current screening methods employed by practitioners, the use of AI-based systems in identifying dyslexia and dysgraphia, and the handwriting datasets available to train such systems. The paper also outlines a framework for developing an AI-integrated screening tool that can identify writing-based behavioral indicators of dyslexia and dysgraphia in children’s handwriting. This framework can be used in conjunction with current screening tools like the Dysgraphia and Dyslexia Behavioral Indicator Checklist (DDBIC). The paper also proposes a methodology for collecting children’s offline and online handwriting samples to build a valuable dataset for developing AI solutions. The proposed framework and data collection methodology are co-designed with SLPs, occupational therapists (OTs), special educators, and general educators to ensure the tool can provide explainable, actionable information that would be invaluable in a practical setting.
more » « less
ProxyFusion: Face Feature Aggregation Through Sparse Experts

Jawade, Bhavin; Stone, Alexander; Mohan, Deen Dayal; Wang, Xiao; Setlur, Srirangaraj; Govindaraju, Venu (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Full Text Available
Ig3D: Integrating 3D Face Representations in Facial Expression Inference

https://doi.org/10.1007/978-3-031-91581-9_29

Dong, Lu; Wang, Xiao; Setlur, Srirangaraj; Govindaraju, Venu; Nwogu, Ifeoma (January 2025, Springer Nature Switzerland)

Full Text Available
Ig3D: Integrating 3D Face Representations in Facial Expression Inference

Dong, Lu; Wang, Xiao; Setlur, Srirangaraj; Govindaraju, Venu; Nwogu, Ifeoma (August 2024, Springer_Science+Business_Media)

Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks.
more » « less
Full Text Available
A Comparative Study of Video-Based Human Representations for American Sign Language Alphabet Generation

https://doi.org/10.1109/FG59268.2024.10582020

Xu, Fei; Chaudhary, Lipisha; Dong, Lu; Setlur, Srirangaraj; Govindaraju, Venu; Nwogu, Ifeoma (May 2024, IEEE)

Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: twostream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformerbased Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work.
more » « less
Full Text Available
Fine-Grained Engine Fault Sound Event Detection Using Multimodal Signals

https://doi.org/10.1109/ICASSP48485.2024.10448485

Fedorishin, Dennis; Forte, Livio; Schneider, Philip; Setlur, Srirangaraj; Govindaraju, Venu (April 2024, IEEE)

Full Text Available
AI-Driven Support for People with Speech & Language Difficulties

https://doi.org/10.1145/3613905.3643984

Dangol, Aayushi; Huang, Yun; Setlur, Srirangaraj; Smolansky, Adele; Subramonyam, Hariharan; Suh, Hyewon; Xiong, Jinjun; Kientz, Julie A (May 2024, ACM)

Full Text Available
Exploring racial and gender disparities in voice biometrics

https://doi.org/10.1038/s41598-022-06673-y

Chen, Xingyu; Li, Zhengxiong; Setlur, Srirangaraj; Xu, Wenyao (December 2022, Scientific Reports)

Abstract Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently. These disparities have been explored in existing biometrics systems such as facial biometrics (identifying individuals based on facial attributes). However, such ethical issues remain largely unexplored in voice biometric systems that are very popular and extensively used globally. Using a corpus of non-speech voice records featuring a diverse group of 300 speakers by race (75 each from White, Black, Asian, and Latinx subgroups) and gender (150 each from female and male subgroups), we explore and reveal that racial subgroup has a similar voice characteristic and gender subgroup has a significant different voice characteristic. Moreover, non-negligible racial and gender disparities exist in speaker identification accuracy by analyzing the performance of one commercial product and five research products. The average accuracy for Latinxs can be 12% lower than Whites (p < 0.05, 95% CI 1.58%, 14.15%) and can be significantly higher for female speakers than males (3.67% higher, p < 0.05, 95% CI 1.23%, 11.57%). We further discover that racial disparities primarily result from the neural network-based feature extraction within the voice biometric product and gender disparities primarily due to both voice inherent characteristic difference and neural network-based feature extraction. Finally, we point out strategies (e.g., feature extraction optimization) to incorporate fairness and inclusive consideration in biometrics technology.
more » « less
Full Text Available
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization

https://doi.org/10.1109/WACV56688.2023.00231

Fedorishin, Dennis; Dayal_Mohan, Deen; Jawade, Bhavin; Setlur, Srirangaraj; Govindaraju, Venu (January 2023, IEEE)

Full Text Available

« Prev Next »

Search for: All records