skip to main content


Search for: All records

Creators/Authors contains: "Nwogu, Ifeoma"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Meta-analyses have not shown emotions to be significant predictors of deception. Criticisms of this conclusion argued that individuals must be engaged with each other in higher stake situations for such emotions to manifest, and that these emotions must be evaluated in their verbal context (Frank and Svetieva in J Appl Res Memory Cognit 1:131–133, 10.1016/j.jarmac.2012.04.006, 2012). This study examined behavioral synchrony as a marker of engagement in higher stakes truthful and deceptive interactions, and then compared the differences in facial expressions of fear, contempt, disgust, anger, and sadness not consistent with the verbal content. Forty-eight pairs of participants were randomly assigned to interviewer and interviewee, and the interviewee was assigned to steal either a watch or a ring and to lie about the item they stole, and tell the truth about the other, under conditions of higher stakes of up to $30 rewards for successful deception, and $0 plus having to write a 15-min essay for unsuccessful deception. The interviews were coded for expression of emotions using EMFACS (Friesen and Ekman in EMFACS-7; emotional facial action coding system, 1984). Synchrony was demonstrated by the pairs of participants expressing overlapping instances of happiness (AU6 + 12). A 3 (low, moderate, high synchrony) × 2 (truth, lie) mixed-design ANOVA found that negative facial expressions of emotion were a significant predictor of deception, but only when they were not consistent with the verbal content, in the moderate and high synchrony conditions. This finding is consistent with data and theorizing that shows that with higher stakes, or with higher engagement, emotions can be a predictor of deception. 
    more » « less
    Free, publicly-accessible full text available December 16, 2024
  2. Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition. First, we disentangle complex human motions into a set of atomic actions during learning, and then assemble novel actions using the learned atomic actions, which offers better adaptability to new actions. Moreover, we introduce a curriculum learning training strategy that leverages masked motion modeling with a gradual increase in the mask ratio, and thus facilitates atomic action assembly. This approach mitigates the overfitting problem commonly encountered in previous methods while enforcing the model to learn better motion representations. We demonstrate the effectiveness of ATOM through extensive experiments, including text-to-motion and action-to-motion synthesis tasks. We further illustrate its superiority in synthesizing plausible and coherent text-guided human motion sequences. 
    more » « less
    Free, publicly-accessible full text available October 26, 2024
  3. As many as three million school age children between the ages of 5 and 14 years, live with severe to profound hearing loss in Nigeria. Many of these Deaf or Hard of Hearing (DHH) children developed their hearing loss later in life, non-congenitally, hence their parents are hearing. While their teachers in the Deaf schools they attend can often communicate effectively with them in dialects of American Sign Language (ASL), the unofficial sign lingua franca in Nigeria, communication at home with other family members is challenging and sometimes non-existent. This results in adverse social consequences including stigmatization, for the students.With the recent successes of AI in natural language understanding, the goal of automated sign language understanding is becoming more realistic using neural deep learning technologies. To this effect, the proposed project aims at co-designing and developing an ongoing AI-driven two-way sign language interpretation tool that can be deployed in homes, to improve language accessibility and communication between the DHH students and other family members. This ensures inclusive and equitable social interactions and can promote lifelong learning opportunities for them outside of the school environment.

     
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  4. As wearable devices become more popular, ego-centric information recorded with these devices can be used to better understand the behaviors of the wearer and other people the wearer is interacting with. Data such as the voice, head movement, galvanic skin responses (GSR) to measure arousal levels, etc., obtained from such devices can provide a window into the underlying affect of both the wearer and his/her conversant. In this study, we examine the characteristics of two types of dyadic conversations. In one case, the interlocutors discuss a topic on which they agree, while the other situation involves interlocutors discussing a topic on which they disagree, even if they are friends. The range of topics is mostly politically motivated. The egocentric information is collected using a pair of wearable smart glasses for video data and a smart wristband for physiological data, including GSR. Using this data, various features are extracted including the facial expressions of the conversant and the 3D motion from the wearer's camera within the environment - this motion is termed as egomotion. The goal of this work is to investigate whether the nature of a discussion could be better determined either by evaluating the behavior of an individual in the conversation or by evaluating the pairing/coupling of the behaviors of the two people in the conversation. The pairing is accomplished using a modified formulation of the dynamic time warping (DTW) algorithm. A random forest classifier is implemented to evaluate the nature of the interaction (agreement versus disagreement) using individualistic and paired features separately. The study found that in the presence of the limited data used in this work, individual behaviors were slightly more indicative of the type of discussion (85.43% accuracy) than the paired behaviors (83.33% accuracy). 
    more » « less
  5. A true interpreting agent not only understands sign language and translates to text, but also understands text and translates to signs. Much of the AI work in sign language translation to date has focused mainly on translating from signs to text. Towards the latter goal, we propose a text-to-sign translation model, SignNet, which exploits the notion of similarity (and dissimilarity) of visual signs in translating. This module presented is only one part of a dual-learning two task process involving text-to-sign (T2S) as well as sign-to-text (S2T). We currently implement SignNet as a single channel architecture so that the output of the T2S task can be fed into S2T in a continuous dual learning framework. By single channel, we refer to a single modality, the body pose joints. In this work, we present SignNet, a T2S task using a novel metric embedding learning process, to preserve the distances between sign embeddings relative to their dissimilarity. We also describe how to choose positive and negative examples of signs for similarity testing. From our analysis, we observe that metric embedding learning-based model perform significantly better than the other models with traditional losses, when evaluated using BLEU scores. In the task of gloss to pose, SignNet performed as well as its state-of-the-art (SoTA) counterparts and outperformed them in the task of text to pose, by showing noteworthy enhancements in BLEU 1 - BLEU 4 scores (BLEU 1: 31 → 39; ≈26% improvement and BLEU 4: 10.43 →11.84; ≈14% improvement) when tested on the popular RWTH PHOENIX-Weather-2014T benchmark dataset 
    more » « less
  6. The role of a sign interpreting agent is to bridge the communication gap between the hearing-only and Deaf or Hard of Hearing communities by translating both from sign language to text and from text to sign language. Until now, much of the AI work in automated sign language processing has focused primarily on sign language to text translation, which puts the advantage mainly on the side of hearing individuals. In this work, we describe advances in sign language processing based on transformer networks. Specifically, we introduce SignNet II, a sign language processing architecture, a promising step towards facilitating two-way sign language communication. It is comprised of sign-to-text and text-to-sign networks jointly trained using a dual learning mechanism. Furthermore, by exploiting the notion of sign similarity, a metric embedding learning process is introduced to enhance the text-to-sign translation performance. Using a bank of multi-feature transformers, we analyzed several input feature representations and discovered that keypoint-based pose features consistently performed well, irrespective of the quality of the input videos. We demonstrated that the two jointly trained networks outperformed their singly-trained counterparts, showing noteworthy enhancements in BLEU-1 - BLEU-4 scores when tested on the largest available German Sign Language (GSL) benchmark dataset. 
    more » « less
  7. While several methods for predicting uncertainty on deep networks have been recently proposed, they do not always readily translate to large and complex datasets without significant overhead. In this paper we utilize a special instance of the Mixture Density Networks (MDNs) to produce an elegant and compact approach to quantity uncertainty in regression problems. When applied to standard regression benchmark datasets, we show an improvement in predictive log-likelihood and root-mean-square-error when compared to existing state-of-the-art methods. We demonstrate the efficacy and practical usefulness of the method for (i) predicting future stock prices from stochastic, highly volatile time-series data; (ii) anomaly detection in real-life highly complex video segments; and (iii) the task of age estimation and data cleansing on the challenging IMDb-Wiki dataset of half a million face images. 
    more » « less
  8. We investigate the behaviors that compressed convolutional models exhibit for two key areas within AI trust: (i) the ability for a model to be explained and (ii) its ability to be robust to adversarial attacks. While compression is known to shrink model size and decrease inference time, other properties of compression are not as well studied. We employ several compression methods on benchmark datasets, including ImageNet, to study how compression affects the convolutional aspects of an image model. We investigate explainability by studying how well compressed convolutional models can extract visual features with t-SNE, as well as visualizing localization ability of our models with class activation maps. We show that even with significantly compressed models, vital explainability is preserved and even enhanced. We find with applying the Carlini & Wagner attack algorithm on our compressed models, robustness is maintained and some forms of compression make attack more difficult or time-consuming. 
    more » « less
  9. This work is motivated by the need to automate the analysis of parent-infant interactions to better understand the existence of any potential behavioral patterns useful for the early diagnosis of autism spectrum disorder (ASD). It presents an approach for synthesizing the facial expression exchanges that occur during parent-infant interactions. This is accomplished by developing a novel approach that uses landmarks when synthesizing changing facial expressions. The proposed model consists of two components: (i) The first is a landmark converter that receives a set of facial landmarks and the target emotion as input and outputs a set of new landmarks transformed to match the emotion. (ii) The second component involves an image converter that takes in an input image, a target landmark and a target emotion and outputs a face transformed to match the input emotion. The inclusion of landmarks in the generation process proves useful in the generation of baby facial expressions; babies have somewhat different facial musculature and facial dynamics than adults. This paper presents a realistic-looking matrix of changing facial expressions sampled from a 2-D emotion continuum (valence and arousal) and displays successfully transferred facial expressions from real-life mother-infant dyads to novel ones. 
    more » « less
  10. While a significant amount of work has been done on the commonly used, tightly -constrained weather-based, German sign language (GSL) dataset, little has been done for continuous sign language translation (SLT) in more realistic settings, including American sign language (ASL) translation. Also, while CNN - based features have been consistently shown to work well on the GSL dataset, it is not clear whether such features will work as well in more realistic settings when there are more heterogeneous signers in non-uniform backgrounds. To this end, in this work, we introduce a new, realistic phrase-level ASL dataset (ASLing), and explore the role of different types of visual features (CNN embeddings, human body keypoints, and optical flow vectors) in translating it to spoken American English. We propose a novel Transformer-based, visual feature learning method for ASL translation. We demonstrate the explainability efficacy of our proposed learning methods by visualizing activation weights under various input conditions and discover that the body keypoints are consistently the most reliable set of input features. Using our model, we successfully transfer-learn from the larger GSL dataset to ASLing, resulting in significant BLEU score improvements. In summary, this work goes a long way in bringing together the AI resources required for automated ASL translation in unconstrained environments. 
    more » « less