skip to main content


Title: Generative Multimodal Models of Nonverbal Synchrony in Close Relationships
Positive interpersonal relationships require shared understanding along with a sense of rapport. A key facet of rapport is mirroring and convergence of facial expression and body language, known as nonverbal synchrony. We examined nonverbal synchrony in a study of 29 heterosexual romantic couples, in which audio, video, and bracelet accelerometer were recorded during three conversations. We extracted facial expression, body movement, and acoustic-prosodic features to train neural network models that predicted the nonverbal behaviors of one partner from those of the other. Recurrent models (LSTMs) outperformed feed-forward neural networks and other chance baselines. The models learned behaviors encompassing facial responses, speech-related facial movements, and head movement. However, they did not capture fleeting or periodic behaviors, such as nodding, head turning, and hand gestures. Notably, a preliminary analysis of clinical measures showed greater association with our model outputs than correlation of raw signals. We discuss potential uses of these generative models as a research tool to complement current analytical methods along with real-world applications (e.g., as a tool in therapy).  more » « less
Award ID(s):
1745442 1660894
PAR ID:
10088106
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
The 13th IEEE International Conference on Automatic Face and Gesture Recognition
Page Range / eLocation ID:
195 to 202
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Displaying emotional states is an important part of nonverbal communication that can facilitate successful interactions. Facial expressions have been studied for their emotional expression, but this work looks at the capacity of body movements to convey different emotions. This work first generates a large set of nonverbal behaviors with a variety of torso and arm properties on a humanoid robot, Quori. Participants in a user study evaluated how much each movement displayed each of eight different emotions. Results indicate that specific movement properties are associated with particular emotions; such as leaning backward and arms held high displaying surprise and leaning forward displaying sadness. Understanding the emotions associated with certain movements can allow for the design of more appropriate behaviors during interactions with humans and could improve people’s perception of the robot. 
    more » « less
  2. Abstract

    This study focuses on the individual and joint contributions of two nonverbal channels (i.e., face and upper body) in avatar mediated-virtual environments. 140 dyads were randomly assigned to communicate with each other via platforms that differentially activated or deactivated facial and bodily nonverbal cues. The availability of facial expressions had a positive effect on interpersonal outcomes. More specifically, dyads that were able to see their partner’s facial movements mapped onto their avatars liked each other more, formed more accurate impressions about their partners, and described their interaction experiences more positively compared to those unable to see facial movements. However, the latter was only true when their partner’s bodily gestures were also available and not when only facial movements were available. Dyads showed greater nonverbal synchrony when they could see their partner’s bodily and facial movements. This study also employed machine learning to explore whether nonverbal cues could predict interpersonal attraction. These classifiers predicted high and low interpersonal attraction at an accuracy rate of 65%. These findings highlight the relative significance of facial cues compared to bodily cues on interpersonal outcomes in virtual environments and lend insight into the potential of automatically tracked nonverbal cues to predict interpersonal attitudes.

     
    more » « less
  3. null (Ed.)
    The current paper addresses two methodological problems pertinent to the analysis of observer studies in nonverbal rapport and beyond. These problems concern: (1) the production of standardized stimulus materials that allow for unbiased observer ratings and (2) the objective measurement of nonverbal behaviors to identify the dyadic patterns underlying the observer impressions. We suggest motion capture and character animation as possible solutions to these problems and exemplarily apply the novel methodology to the study of gender and cultural differences in nonverbal rapport. We compared a Western, individualistic culture with an egalitarian gender-role conception (Germany) and a collectivistic culture with a more traditional gender role conceptions (Middle East, Gulf States). Motion capture data were collected for five male and five female dyadic interactions in each culture. Character animations based on the motion capture data served as stimuli in the observation study. Female and male observers from both cultures rated the perceived rapport continuously while watching the 1 min sequences and guessed gender and cultural background of the dyads after each clip. Results show that masking of gender and culture in the stimuli was successful, as hit rates for both aspects remained at chance level. Further the results revealed high levels of agreement in the rapport ratings across gender and culture, pointing to universal judgment policies. A 2 × 2 × 2 × 2 ANOVA for gender and culture of stimuli and observers showed that female dyads were rated significantly higher on rapport across the board and that the contrast between female and male dyads was more pronounced in the Arab sample as compared to the German sample. nonverbal parameters extracted from the motion capture protocols were submitted to a series of algorithms to identify dyadic activity levels and coordination patterns relevant to the perception of rapport. The results are critically discussed with regard to the role of nonverbal coordination as a constituent of rapport. 
    more » « less
  4. This research work explores different machine learning techniques for recognizing the existence of rapport between two people engaged in a conversation, based on their facial expressions. First using artificially generated pairs of correlated data signals, a coupled gated recurrent unit (cGRU) neural network is developed to measure the extent of similarity between the temporal evolution of pairs of time-series signals. By pre-selecting their covariance values (between 0.1 and 1.0), pairs of coupled sequences are generated. Using the developed cGRU architecture, this covariance between the signals is successfully recovered. Using this and various other coupled architectures, tests for rapport (measured by the extent of mirroring and mimicking of behaviors) are conducted on real-life datasets. On fifty-nine (N = 59) pairs of interactants in an interview setting, a transformer based coupled architecture performs the best in determining the existence of rapport. To test for generalization, the models were applied on never-been-seen data collected 14 years prior, also to predict the existence of rapport. The coupled transformer model again performed the best for this transfer learning task, determining which pairs of interactants had rapport and which did not. The experiments and results demonstrate the advantages of coupled architectures for predicting an interactional process such as rapport, even in the presence of limited data. 
    more » « less
  5. Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations. 
    more » « less