- NSF-PAR ID:
- 10088106
- Date Published:
- Journal Name:
- The 13th IEEE International Conference on Automatic Face and Gesture Recognition
- Page Range / eLocation ID:
- 195 to 202
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Displaying emotional states is an important part of nonverbal communication that can facilitate successful interactions. Facial expressions have been studied for their emotional expression, but this work looks at the capacity of body movements to convey different emotions. This work first generates a large set of nonverbal behaviors with a variety of torso and arm properties on a humanoid robot, Quori. Participants in a user study evaluated how much each movement displayed each of eight different emotions. Results indicate that specific movement properties are associated with particular emotions; such as leaning backward and arms held high displaying surprise and leaning forward displaying sadness. Understanding the emotions associated with certain movements can allow for the design of more appropriate behaviors during interactions with humans and could improve people’s perception of the robot.more » « less
-
Abstract This study focuses on the individual and joint contributions of two nonverbal channels (i.e., face and upper body) in avatar mediated-virtual environments. 140 dyads were randomly assigned to communicate with each other via platforms that differentially activated or deactivated facial and bodily nonverbal cues. The availability of facial expressions had a positive effect on interpersonal outcomes. More specifically, dyads that were able to see their partner’s facial movements mapped onto their avatars liked each other more, formed more accurate impressions about their partners, and described their interaction experiences more positively compared to those unable to see facial movements. However, the latter was only true when their partner’s bodily gestures were also available and not when only facial movements were available. Dyads showed greater nonverbal synchrony when they could see their partner’s bodily and facial movements. This study also employed machine learning to explore whether nonverbal cues could predict interpersonal attraction. These classifiers predicted high and low interpersonal attraction at an accuracy rate of 65%. These findings highlight the relative significance of facial cues compared to bodily cues on interpersonal outcomes in virtual environments and lend insight into the potential of automatically tracked nonverbal cues to predict interpersonal attitudes.
-
This research work explores different machine learning techniques for recognizing the existence of rapport between two people engaged in a conversation, based on their facial expressions. First using artificially generated pairs of correlated data signals, a coupled gated recurrent unit (cGRU) neural network is developed to measure the extent of similarity between the temporal evolution of pairs of time-series signals. By pre-selecting their covariance values (between 0.1 and 1.0), pairs of coupled sequences are generated. Using the developed cGRU architecture, this covariance between the signals is successfully recovered. Using this and various other coupled architectures, tests for rapport (measured by the extent of mirroring and mimicking of behaviors) are conducted on real-life datasets. On fifty-nine (N = 59) pairs of interactants in an interview setting, a transformer based coupled architecture performs the best in determining the existence of rapport. To test for generalization, the models were applied on never-been-seen data collected 14 years prior, also to predict the existence of rapport. The coupled transformer model again performed the best for this transfer learning task, determining which pairs of interactants had rapport and which did not. The experiments and results demonstrate the advantages of coupled architectures for predicting an interactional process such as rapport, even in the presence of limited data.more » « less
-
Digital Behavioral Phenotyping Detects Atypical Pattern of Facial Expression in Toddlers with Autism
Commonly used screening tools for autism spectrum disorder (ASD) generally rely on subjective caregiver questionnaires. While behavioral observation is more objective, it is also expensive, time‐consuming, and requires significant expertise to perform. As such, there remains a critical need to develop feasible, scalable, and reliable tools that can characterize ASD risk behaviors. This study assessed the utility of a tablet‐based behavioral assessment for eliciting and detecting one type of risk behavior, namely, patterns of facial expression, in 104 toddlers (ASD
N = 22) and evaluated whether such patterns differentiated toddlers with and without ASD. The assessment consisted of the child sitting on his/her caregiver's lap and watching brief movies shown on a smart tablet while the embedded camera recorded the child's facial expressions. Computer vision analysis (CVA) automatically detected and tracked facial landmarks, which were used to estimate head position and facial expressions (Positive, Neutral, All Other). Using CVA, specific points throughout the movies were identified that reliably differentiate between children with and without ASD based on their patterns of facial movement and expressions (area under the curves for individual movies ranging from 0.62 to 0.73). During these instances, children with ASD more frequently displayed Neutral expressions compared to children without ASD, who had more All Other expressions. The frequency of All Other expressions was driven by non‐ASD children more often displaying raised eyebrows and an open mouth, characteristic of engagement/interest. Preliminary results suggest computational coding of facial movements and expressions via a tablet‐based assessment can detect differences in affective expression, one of the early, core features of ASD.Lay Summary This study tested the use of a tablet in the behavioral assessment of young children with autism. Children watched a series of developmentally appropriate movies and their facial expressions were recorded using the camera embedded in the tablet. Results suggest that computational assessments of facial expressions may be useful in early detection of symptoms of autism.
-
Raynal, Ann M. ; Ranney, Kenneth I. (Ed.)Most research in technologies for the Deaf community have focused on translation using either video or wearable devices. Sensor-augmented gloves have been reported to yield higher gesture recognition rates than camera-based systems; however, they cannot capture information expressed through head and body movement. Gloves are also intrusive and inhibit users in their pursuit of normal daily life, while cameras can raise concerns over privacy and are ineffective in the dark. In contrast, RF sensors are non-contact, non-invasive and do not reveal private information even if hacked. Although RF sensors are unable to measure facial expressions or hand shapes, which would be required for complete translation, this paper aims to exploit near real-time ASL recognition using RF sensors for the design of smart Deaf spaces. In this way, we hope to enable the Deaf community to benefit from advances in technologies that could generate tangible improvements in their quality of life. More specifically, this paper investigates near real-time implementation of machine learning and deep learning architectures for the purpose of sequential ASL signing recognition. We utilize a 60 GHz RF sensor which transmits a frequency modulation continuous wave (FMWC waveform). RF sensors can acquire a unique source of information that is inaccessible to optical or wearable devices: namely, a visual representation of the kinematic patterns of motion via the micro-Doppler signature. Micro-Doppler refers to frequency modulations that appear about the central Doppler shift, which are caused by rotational or vibrational motions that deviate from principle translational motion. In prior work, we showed that fractal complexity computed from RF data could be used to discriminate signing from daily activities and that RF data could reveal linguistic properties, such as coarticulation. We have also shown that machine learning can be used to discriminate with 99% accuracy the signing of native Deaf ASL users from that of copysigning (or imitation signing) by hearing individuals. Therefore, imitation signing data is not effective for directly training deep models. But, adversarial learning can be used to transform imitation signing to resemble native signing, or, alternatively, physics-aware generative models can be used to synthesize ASL micro-Doppler signatures for training deep neural networks. With such approaches, we have achieved over 90% recognition accuracy of 20 ASL signs. In natural environments, however, near real-time implementations of classification algorithms are required, as well as an ability to process data streams in a continuous and sequential fashion. In this work, we focus on extensions of our prior work towards this aim, and compare the efficacy of various approaches for embedding deep neural networks (DNNs) on platforms such as a Raspberry Pi or Jetson board. We examine methods for optimizing the size and computational complexity of DNNs for embedded micro-Doppler analysis, methods for network compression, and their resulting sequential ASL recognition performance.more » « less