skip to main content


Title: Evaluating Temporal Patterns in Applied Infant Affect Recognition. 10th International Conference on Affective Computing and Intelligent Interaction
Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants’ affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions.  more » « less
Award ID(s):
1706964
NSF-PAR ID:
10354826
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
10th International Conference on Affective Computing and Intelligent Interaction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Depression is a very common mental health disorder with a devastating social and economic impact. It can be costly and difficult to detect, traditionally requiring a significant number of hours by a trained mental health professional. Recently, machine learning and deep learning models have been trained for depression screening using modalities extracted from videos of clinical interviews conducted by a virtual agent. This complex task is challenging for deep learning models because of the multiple modalities and limited number of participants in the dataset. To address these challenges we propose AudiFace, a multimodal deep learning model that inputs temporal facial features, audio, and transcripts to screen for depression. To incorporate all three modalities, AudiFace combines multiple pre-trained transfer learning models and bidirectional LSTM with self-Attention. When compared with the state-of-the-art models, AudiFace achieves the highest F1 scores for thirteen of the fifteen different datasets. AudiFace notably improves the depression screening capabilities of general wellbeing questions. Eye gaze proved to be the most valuable of the temporal facial features, both in the unimodal and multimodal models. Our results can be used to determine the best combination of modalities, temporal facial features, as well as clinical interview questions for future depression screening applications. 
    more » « less
  2. Observing how infants and mothers coordinate their behaviors can highlight meaningful patterns in early communication and infant development. While dyads often differ in the modalities they use to communicate, especially in the first year of life, it remains unclear how to capture coordination across multiple types of behaviors using existing computational models of interpersonal synchrony. This paper explores Dynamic Mode Decomposition with control (DMDc) as a method of integrating multiple signals from each communicating partner into a model of multimodal behavioral coordination. We used an existing video dataset to track the head pose, arm pose, and vocal fundamental frequency of infants and mothers during the Face-to-Face Still-Face (FFSF) procedure, a validated 3-stage interaction paradigm. For each recorded interaction, we fit both unimodal and multimodal DMDc models to the extracted pose data. The resulting dynamic characteristics of the models were analyzed to evaluate trends in individual behaviors and dyadic processes across infant age and stages of the interactions. Results demonstrate that observed trends in interaction dynamics across stages of the FFSF protocol were stronger and more significant when models incorporated both head and arm pose data, rather than a single behavior modality. Model output showed significant trends across age, identifying changes in infant movement and in the relationship between infant and mother behaviors. Models that included mothers’ audio data demonstrated similar results to those evaluated with pose data, confirming that DMDc can leverage different sets of behavioral signals from each interacting partner. Taken together, our results demonstrate the potential of DMDc toward integrating multiple behavioral signals into the measurement of multimodal interpersonal coordination. 
    more » « less
  3. null (Ed.)
    Abstract People spontaneously infer other people’s psychology from faces, encompassing inferences of their affective states, cognitive states, and stable traits such as personality. These judgments are known to be often invalid, but nonetheless bias many social decisions. Their importance and ubiquity have made them popular targets for automated prediction using deep convolutional neural networks (DCNNs). Here, we investigated the applicability of this approach: how well does it generalize, and what biases does it introduce? We compared three distinct sets of features (from a face identification DCNN, an object recognition DCNN, and using facial geometry), and tested their prediction across multiple out-of-sample datasets. Across judgments and datasets, features from both pre-trained DCNNs provided better predictions than did facial geometry. However, predictions using object recognition DCNN features were not robust to superficial cues (e.g., color and hair style). Importantly, predictions using face identification DCNN features were not specific: models trained to predict one social judgment (e.g., trustworthiness) also significantly predicted other social judgments (e.g., femininity and criminal), and at an even higher accuracy in some cases than predicting the judgment of interest (e.g., trustworthiness). Models trained to predict affective states (e.g., happy) also significantly predicted judgments of stable traits (e.g., sociable), and vice versa. Our analysis pipeline not only provides a flexible and efficient framework for predicting affective and social judgments from faces but also highlights the dangers of such automated predictions: correlated but unintended judgments can drive the predictions of the intended judgments. 
    more » « less
  4. The overall goal of our research is to develop a system of intelligent multimodal affective pedagogical agents that are effective for different types of learners (Adamo et al., 2021). While most of the research on pedagogical agents tends to focus on the cognitive aspects of online learning and instruction, this project explores the less-studied role of affective (or emotional) factors. We aim to design believable animated agents that can convey realistic, natural emotions through speech, facial expressions, and body gestures and that can react to the students’ detected emotional states with emotional intelligence. Within the context of this goal, the specific objective of the work reported in the paper was to examine the extent to which the agents’ facial micro-expressions affect students’ perception of the agents’ emotions and their naturalness. Micro-expressions are very brief facial expressions that occur when a person either deliberately or unconsciously conceals an emotion being felt (Ekman &Friesen, 1969). Our assumption is that if the animated agents display facial micro expressions in addition to macro expressions, they will convey higher expressive richness and naturalness to the viewer, as “the agents can possess two emotional streams, one based on interaction with the viewer and the other based on their own internal state, or situation” (Queiroz et al. 2014, p.2).The work reported in the paper involved two studies with human subjects. The objectives of the first study were to examine whether people can recognize micro-expressions (in isolation) in animated agents, and whether there are differences in recognition based on the agent’s visual style (e.g., stylized versus realistic). The objectives of the second study were to investigate whether people can recognize the animated agents’ micro-expressions when integrated with macro-expressions, the extent to which the presence of micro + macro-expressions affect the perceived expressivity and naturalness of the animated agents, the extent to which exaggerating the micro expressions, e.g. increasing the amplitude of the animated facial displacements affects emotion recognition and perceived agent naturalness and emotional expressivity, and whether there are differences based on the agent’s design characteristics. In the first study, 15 participants watched eight micro-expression animations representing four different emotions (happy, sad, fear, surprised). Four animations featured a stylized agent and four a realistic agent. For each animation, subjects were asked to identify the agent’s emotion conveyed by the micro-expression. In the second study, 234 participants watched three sets of eight animation clips (24 clips in total, 12 clips per agent). Four animations for each agent featured the character performing macro-expressions only, four animations for each agent featured the character performing macro- + micro-expressions without exaggeration, and four animations for each agent featured the agent performing macro + micro-expressions with exaggeration. Participants were asked to recognize the true emotion of the agent and rate the emotional expressivity ad naturalness of the agent in each clip using a 5-point Likert scale. We have collected all the data and completed the statistical analysis. Findings and discussion, implications for research and practice, and suggestions for future work will be reported in the full paper. ReferencesAdamo N., Benes, B., Mayer, R., Lei, X., Meyer, Z., &Lawson, A. (2021). Multimodal Affective Pedagogical Agents for Different Types of Learners. In: Russo D., Ahram T., Karwowski W., Di Bucchianico G., Taiar R. (eds) Intelligent Human Systems Integration 2021. IHSI 2021. Advances in Intelligent Systems and Computing, 1322. Springer, Cham. https://doi.org/10.1007/978-3-030-68017-6_33Ekman, P., &Friesen, W. V. (1969, February). Nonverbal leakage and clues to deception. Psychiatry, 32(1), 88–106. https://doi.org/10.1080/00332747.1969.11023575 Queiroz, R. B., Musse, S. R., &Badler, N. I. (2014). Investigating Macroexpressions and Microexpressions in Computer Graphics Animated Faces. Presence, 23(2), 191-208. http://dx.doi.org/10.1162/

     
    more » « less
  5. Abstract

    Although still‐face effects are well‐studied, little is known about the degree to which the Face‐to‐Face/Still‐Face (FFSF) is associated with the production of intense affective displays. Duchenne smiling expresses more intense positive affect than non‐Duchenne smiling, while Duchenne cry‐faces express more intense negative affect than non‐Duchenne cry‐faces. Forty 4‐month‐old infants and their mothers completed the FFSF, and key affect‐indexing facial Action Units (AUs) were coded by expert Facial Action Coding System coders for the first 30 s of each FFSF episode. Computer vision software, automated facial affect recognition (AFAR), identified AUs for the entire 2‐min episodes. Expert coding and AFAR produced similar infant and mother Duchenne and non‐Duchenne FFSF effects, highlighting the convergent validity of automated measurement. Substantive AFAR analyses indicated that both infant Duchenne and non‐Duchenne smiling declined from the FF to the SF, but only Duchenne smiling increased from the SF to the RE. In similar fashion, the magnitude of mother Duchenne smiling changes over the FFSF were 2–4 times greater than non‐Duchenne smiling changes. Duchenne expressions appear to be a sensitive index of intense infant and mother affective valence that are accessible to automated measurement and may be a target for future FFSF research.

     
    more » « less