skip to main content

Title: Modeling Aesthetics and Emotions in Visual Content: From Vincent van Gogh to Robotics and Vision
As inborn characteristics, humans possess the ability to judge visual aesthetics, feel the emotions from the environment, and comprehend others’ emotional expressions. Many exciting applications become possible if robots or computers can be empowered with similar capabilities. Modeling aesthetics, evoked emotions, and emotional expressions automatically in unconstrained situations, however, is daunting due to the lack of a full understanding of the relationship between low-level visual content and high-level aesthetics or emotional expressions. With the growing availability of data, it is possible to tackle these problems using machine learning and statistical modeling approaches. In the talk, I provide an overview of our research in the last two decades on data-driven analyses of visual artworks and digital visual content for modeling aesthetics and emotions. First, I discuss our analyses of styles in visual artworks. Art historians have long observed the highly characteristic brushstroke styles of Vincent van Gogh and have relied on discerning these styles for authenticating and dating his works. In our work, we compared van Gogh with his contemporaries by statistically analyzing a massive set of automatically extracted brushstrokes. A novel extraction method is developed by exploiting an integration of edge detection and clustering-based segmentation. Evidence substantiates that van Gogh’s more » brushstrokes are strongly rhythmic. Next, I describe an effort to model the aesthetic and emotional characteristics in visual contents such as photographs. By taking a data-driven approach, using the Internet as the data source, we show that computers can be trained to recognize various characteristics that are highly relevant to aesthetics and emotions. Future computer systems equipped with such capabilities are expected to help millions of users in unimagined ways. Finally, I highlight our research on automated recognition of bodily expression of emotion. We propose a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize the body language of humans. Comprehensive statistical analysis revealed many interesting insights from the dataset. A system to model the emotional expressions based on bodily movements, named ARBEE (Automated Recognition of Bodily Expression of Emotion), has also been developed and evaluated. « less
Authors:
Award ID(s):
1921783
Publication Date:
NSF-PAR ID:
10295639
Journal Name:
Proceedings of the Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends, in conjunction with the ACM International Conference on Multimedia
Page Range or eLocation-ID:
15-16
Sponsoring Org:
National Science Foundation
More Like this
  1. The overall goal of our research is to develop a system of intelligent multimodal affective pedagogical agents that are effective for different types of learners (Adamo et al., 2021). While most of the research on pedagogical agents tends to focus on the cognitive aspects of online learning and instruction, this project explores the less-studied role of affective (or emotional) factors. We aim to design believable animated agents that can convey realistic, natural emotions through speech, facial expressions, and body gestures and that can react to the students’ detected emotional states with emotional intelligence. Within the context of this goal, the specific objective of the work reported in the paper was to examine the extent to which the agents’ facial micro-expressions affect students’ perception of the agents’ emotions and their naturalness. Micro-expressions are very brief facial expressions that occur when a person either deliberately or unconsciously conceals an emotion being felt (Ekman &Friesen, 1969). Our assumption is that if the animated agents display facial micro expressions in addition to macro expressions, they will convey higher expressive richness and naturalness to the viewer, as “the agents can possess two emotional streams, one based on interaction with the viewer and the other basedmore »on their own internal state, or situation” (Queiroz et al. 2014, p.2).The work reported in the paper involved two studies with human subjects. The objectives of the first study were to examine whether people can recognize micro-expressions (in isolation) in animated agents, and whether there are differences in recognition based on the agent’s visual style (e.g., stylized versus realistic). The objectives of the second study were to investigate whether people can recognize the animated agents’ micro-expressions when integrated with macro-expressions, the extent to which the presence of micro + macro-expressions affect the perceived expressivity and naturalness of the animated agents, the extent to which exaggerating the micro expressions, e.g. increasing the amplitude of the animated facial displacements affects emotion recognition and perceived agent naturalness and emotional expressivity, and whether there are differences based on the agent’s design characteristics. In the first study, 15 participants watched eight micro-expression animations representing four different emotions (happy, sad, fear, surprised). Four animations featured a stylized agent and four a realistic agent. For each animation, subjects were asked to identify the agent’s emotion conveyed by the micro-expression. In the second study, 234 participants watched three sets of eight animation clips (24 clips in total, 12 clips per agent). Four animations for each agent featured the character performing macro-expressions only, four animations for each agent featured the character performing macro- + micro-expressions without exaggeration, and four animations for each agent featured the agent performing macro + micro-expressions with exaggeration. Participants were asked to recognize the true emotion of the agent and rate the emotional expressivity ad naturalness of the agent in each clip using a 5-point Likert scale. We have collected all the data and completed the statistical analysis. Findings and discussion, implications for research and practice, and suggestions for future work will be reported in the full paper. ReferencesAdamo N., Benes, B., Mayer, R., Lei, X., Meyer, Z., &Lawson, A. (2021). Multimodal Affective Pedagogical Agents for Different Types of Learners. In: Russo D., Ahram T., Karwowski W., Di Bucchianico G., Taiar R. (eds) Intelligent Human Systems Integration 2021. IHSI 2021. Advances in Intelligent Systems and Computing, 1322. Springer, Cham. https://doi.org/10.1007/978-3-030-68017-6_33Ekman, P., &Friesen, W. V. (1969, February). Nonverbal leakage and clues to deception. Psychiatry, 32(1), 88–106. https://doi.org/10.1080/00332747.1969.11023575 Queiroz, R. B., Musse, S. R., &Badler, N. I. (2014). Investigating Macroexpressions and Microexpressions in Computer Graphics Animated Faces. Presence, 23(2), 191-208. http://dx.doi.org/10.1162/

    « less
  2. In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness—the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As such, the ability to automatically detect or predict expressiveness can facilitate significant advancements in areas ranging from psychiatric care to artificial social intelligence. Motivated by these potential applications, we present an extension of the BP4D+ data set [27] with human ratings of expressiveness and develop methods for (1) automatically predicting expressiveness from visual data and (2) defining relationships between interpretable visual signals and expressiveness. In addition, we study the emotional context in which expressiveness occurs and hypothesize that different sets of signals are indicative of expressiveness in different con-texts (e.g., in response to surprise or in response to pain). Analysis of our statistical models confirms our hypothesis. Consequently, by looking at expressiveness separately in distinct emotional contexts, our predictive models show significant improvements over baselines and achieve com-parable results to human performance in terms of correlation with the ground truth.
  3. Abstract The enhancement hypothesis suggests that deaf individuals are more vigilant to visual emotional cues than hearing individuals. The present eye-tracking study examined ambient–focal visual attention when encoding affect from dynamically changing emotional facial expressions. Deaf (n = 17) and hearing (n = 17) individuals watched emotional facial expressions that in 10-s animations morphed from a neutral expression to one of happiness, sadness, or anger. The task was to recognize emotion as quickly as possible. Deaf participants tended to be faster than hearing participants in affect recognition, but the groups did not differ in accuracy. In general, happy faces were more accurately and more quickly recognized than faces expressing anger or sadness. Both groups demonstrated longer average fixation duration when recognizing happiness in comparison to anger and sadness. Deaf individuals directed their first fixations less often to the mouth region than the hearing group. During the last stages of emotion recognition, deaf participants exhibited more focal viewing of happy faces than negative faces. This pattern was not observed among hearing individuals. The analysis of visual gaze dynamics, switching between ambient and focal attention, was useful in studying the depth of cognitive processing of emotional information among deaf and hearing individuals.
  4. This paper demonstrates the utility of ambient-focal attention and pupil dilation dynamics to describe visual processing of emotional facial expressions. Pupil dilation and focal eye movements reflect deeper cognitive processing and thus shed more light on the dy- namics of emotional expression recognition. Socially anxious in- dividuals (N = 24) and non-anxious controls (N = 24) were asked to recognize emotional facial expressions that gradually morphed from a neutral expression to one of happiness, sadness, or anger in 10-sec animations. Anxious cohorts exhibited more ambient face scanning than their non-anxious counterparts. We observed a positive relationship between focal fixations and pupil dilation, indi- cating deeper processing of viewed faces, but only by non-anxious participants, and only during the last phase of emotion recognition. Group differences in the dynamics of ambient-focal attention sup- port the hypothesis of vigilance to emotional expression processing by socially anxious individuals. We discuss the results by referring to current literature on cognitive psychopathology.
  5. In recent news, organizations have been considering the use of facial and emotion recognition for applications involving youth such as tackling surveillance and security in schools. However, the majority of efforts on facial emotion recognition research have focused on adults. Children, particularly in their early years, have been shown to express emotions quite differently than adults. Thus, before such algorithms are deployed in environments that impact the wellbeing and circumstance of youth, a careful examination should be made on their accuracy with respect to appropriateness for this target demographic. In this work, we utilize several datasets that contain facial expressions of children linked to their emotional state to evaluate eight different commercial emotion classification systems. We compare the ground truth labels provided by the respective datasets to the labels given with the highest confidence by the classification systems and assess the results in terms of matching score (TPR), positive predictive value, and failure to compute rate. Overall results show that the emotion recognition systems displayed subpar performance on the datasets of children's expressions compared to prior work with adult datasets and initial human ratings. We then identify limitations associated with automated recognition of emotions in children and provide suggestions onmore »directions with enhancing recognition accuracy through data diversification, dataset accountability, and algorithmic regulation.« less