The overall goal of our research is to develop a system of intelligent multimodal affective pedagogical agents that are effective for different types of learners (Adamo et al., 2021). While most of the research on pedagogical agents tends to focus on the cognitive aspects of online learning and instruction, this project explores the less-studied role of affective (or emotional) factors. We aim to design believable animated agents that can convey realistic, natural emotions through speech, facial expressions, and body gestures and that can react to the students’ detected emotional states with emotional intelligence. Within the context of this goal, the specific objective of the work reported in the paper was to examine the extent to which the agents’ facial micro-expressions affect students’ perception of the agents’ emotions and their naturalness. Micro-expressions are very brief facial expressions that occur when a person either deliberately or unconsciously conceals an emotion being felt (Ekman &Friesen, 1969). Our assumption is that if the animated agents display facial micro expressions in addition to macro expressions, they will convey higher expressive richness and naturalness to the viewer, as “the agents can possess two emotional streams, one based on interaction with the viewer and the other basedmore »
Modeling Aesthetics and Emotions in Visual Content: From Vincent van Gogh to Robotics and Vision
As inborn characteristics, humans possess the ability to judge visual aesthetics, feel the emotions from the environment, and comprehend others’ emotional expressions. Many exciting applications become possible if robots or computers can be empowered with similar capabilities. Modeling aesthetics, evoked emotions, and emotional expressions automatically in unconstrained situations, however, is daunting due to the lack of a full understanding of the relationship between low-level visual content and high-level aesthetics or emotional expressions. With the growing availability of data, it is possible to tackle these problems using machine learning and statistical modeling approaches. In the talk, I provide an overview of our research in the last two decades on data-driven analyses of visual artworks and digital visual content for modeling aesthetics and emotions. First, I discuss our analyses of styles in visual artworks. Art historians have long observed the highly characteristic brushstroke styles of Vincent van Gogh and have relied on discerning these styles for authenticating and dating his works. In our work, we compared van Gogh with his contemporaries by statistically analyzing a massive set of automatically extracted brushstrokes. A novel extraction method is developed by exploiting an integration of edge detection and clustering-based segmentation. Evidence substantiates that van Gogh’s more »
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends, in conjunction with the ACM International Conference on Multimedia
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness—the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As such, the ability to automatically detect or predict expressiveness can facilitate significant advancements in areas ranging from psychiatric care to artificial social intelligence. Motivated by these potential applications, we present an extension of the BP4D+ data set  with human ratings of expressiveness and develop methods for (1) automatically predicting expressiveness from visual data and (2) defining relationships between interpretable visual signals and expressiveness. In addition, we study the emotional context in which expressiveness occurs and hypothesize that different sets of signals are indicative of expressiveness in different con-texts (e.g., in response to surprise or in response to pain). Analysis of our statistical models confirms our hypothesis. Consequently, by looking at expressiveness separately in distinct emotional contexts, our predictive models show significant improvements over baselines and achieve com-parable results to human performance in terms of correlation with the ground truth.
Abstract The enhancement hypothesis suggests that deaf individuals are more vigilant to visual emotional cues than hearing individuals. The present eye-tracking study examined ambient–focal visual attention when encoding affect from dynamically changing emotional facial expressions. Deaf (n = 17) and hearing (n = 17) individuals watched emotional facial expressions that in 10-s animations morphed from a neutral expression to one of happiness, sadness, or anger. The task was to recognize emotion as quickly as possible. Deaf participants tended to be faster than hearing participants in affect recognition, but the groups did not differ in accuracy. In general, happy faces were more accurately and more quickly recognized than faces expressing anger or sadness. Both groups demonstrated longer average fixation duration when recognizing happiness in comparison to anger and sadness. Deaf individuals directed their first fixations less often to the mouth region than the hearing group. During the last stages of emotion recognition, deaf participants exhibited more focal viewing of happy faces than negative faces. This pattern was not observed among hearing individuals. The analysis of visual gaze dynamics, switching between ambient and focal attention, was useful in studying the depth of cognitive processing of emotional information among deaf and hearing individuals.
This paper demonstrates the utility of ambient-focal attention and pupil dilation dynamics to describe visual processing of emotional facial expressions. Pupil dilation and focal eye movements reflect deeper cognitive processing and thus shed more light on the dy- namics of emotional expression recognition. Socially anxious in- dividuals (N = 24) and non-anxious controls (N = 24) were asked to recognize emotional facial expressions that gradually morphed from a neutral expression to one of happiness, sadness, or anger in 10-sec animations. Anxious cohorts exhibited more ambient face scanning than their non-anxious counterparts. We observed a positive relationship between focal fixations and pupil dilation, indi- cating deeper processing of viewed faces, but only by non-anxious participants, and only during the last phase of emotion recognition. Group differences in the dynamics of ambient-focal attention sup- port the hypothesis of vigilance to emotional expression processing by socially anxious individuals. We discuss the results by referring to current literature on cognitive psychopathology.
A Comparative Analysis of Emotion-Detecting AI Systems with Respect to Algorithm Performance and Dataset DiversityIn recent news, organizations have been considering the use of facial and emotion recognition for applications involving youth such as tackling surveillance and security in schools. However, the majority of efforts on facial emotion recognition research have focused on adults. Children, particularly in their early years, have been shown to express emotions quite differently than adults. Thus, before such algorithms are deployed in environments that impact the wellbeing and circumstance of youth, a careful examination should be made on their accuracy with respect to appropriateness for this target demographic. In this work, we utilize several datasets that contain facial expressions of children linked to their emotional state to evaluate eight different commercial emotion classification systems. We compare the ground truth labels provided by the respective datasets to the labels given with the highest confidence by the classification systems and assess the results in terms of matching score (TPR), positive predictive value, and failure to compute rate. Overall results show that the emotion recognition systems displayed subpar performance on the datasets of children's expressions compared to prior work with adult datasets and initial human ratings. We then identify limitations associated with automated recognition of emotions in children and provide suggestions onmore »