Abstract Faces are salient social stimuli that attract a stereotypical pattern of eye movement. The human amygdala and hippocampus are involved in various aspects of face processing; however, it remains unclear how they encode the content of fixations when viewing faces. To answer this question, we employed single-neuron recordings with simultaneous eye tracking when participants viewed natural face stimuli. We found a class of neurons in the human amygdala and hippocampus that encoded salient facial features such as the eyes and mouth. With a control experiment using non-face stimuli, we further showed that feature selectivity was specific to faces. We also found another population of neurons that differentiated saccades to the eyes vs. the mouth. Population decoding confirmed our results and further revealed the temporal dynamics of face feature coding. Interestingly, we found that the amygdala and hippocampus played different roles in encoding facial features. Lastly, we revealed two functional roles of feature-selective neurons: 1) they encoded the salient region for face recognition, and 2) they were related to perceived social trait judgments. Together, our results link eye movement with neural face processing and provide important mechanistic insights for human face perception.
more »
« less
Emotional Empathy and Facial Mimicry of Avatar Faces
We explore the extent to which empathetic reactions are elicited when subjects view 3D motion-capture driven avatar faces compared to viewing human faces. Through a remote study, we captured subjects' facial reactions when viewing avatar and humans faces, and elicited self reported feedback regarding empathy. Avatar faces varied by gender and realism. Results show no sign of facial mimicry; only mimicking of slight facial movements with no solid consistency. Participants tended to empathize with avatars when they could adequately identify the stimulus' emotion. As avatar realism increased, it negatively impacted the subjects' feelings towards the stimuli.
more »
« less
- Award ID(s):
- 1851591
- PAR ID:
- 10401586
- Date Published:
- Journal Name:
- 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)
- Page Range / eLocation ID:
- 770 to 771
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The enhancement hypothesis suggests that deaf individuals are more vigilant to visual emotional cues than hearing individuals. The present eye-tracking study examined ambient–focal visual attention when encoding affect from dynamically changing emotional facial expressions. Deaf (n = 17) and hearing (n = 17) individuals watched emotional facial expressions that in 10-s animations morphed from a neutral expression to one of happiness, sadness, or anger. The task was to recognize emotion as quickly as possible. Deaf participants tended to be faster than hearing participants in affect recognition, but the groups did not differ in accuracy. In general, happy faces were more accurately and more quickly recognized than faces expressing anger or sadness. Both groups demonstrated longer average fixation duration when recognizing happiness in comparison to anger and sadness. Deaf individuals directed their first fixations less often to the mouth region than the hearing group. During the last stages of emotion recognition, deaf participants exhibited more focal viewing of happy faces than negative faces. This pattern was not observed among hearing individuals. The analysis of visual gaze dynamics, switching between ambient and focal attention, was useful in studying the depth of cognitive processing of emotional information among deaf and hearing individuals.more » « less
-
null (Ed.)People interacting with voice assistants are often frustrated by voice assistants' frequent errors and inability to respond to backchannel cues. We introduce an open-source video dataset of 21 participants' interactions with a voice assistant, and explore the possibility of using this dataset to enable automatic error recognition to inform self-repair. The dataset includes clipped and labeled videos of participants' faces during free-form interactions with the voice assistant from the smart speaker's perspective. To validate our dataset, we emulated a machine learning classifier by asking crowdsourced workers to recognize voice assistant errors from watching soundless video clips of participants' reactions. We found trends suggesting it is possible to determine the voice assistant's performance from a participant's facial reaction alone. This work posits elicited datasets of interactive responses as a key step towards improving error recognition for repair for voice assistants in a wide variety of applications.more » « less
-
This paper explores avatar identification in creative story- telling applications where users create their own story and environment. We present a study that investigated the effects of avatar facial similarity to the user on the quality of the story product they create. The children told a story using a digital puppet-based storytelling system by inter- acting with a physical puppet box that was augmented with a real-time video feed of the puppet enactment. We used a facial morphing technique to manipulate avatar facial similarity to the user. The resulting morphed image was applied to each participants puppet character, thus creating a custom avatar for each child to use in story creation. We hypothesized that the more familiar avatars appeared to participants, the stronger the sense of character identification would be, resulting in higher story quality. The proposed rationale is that visual familiarity may lead participants to draw richer story details from their past real-life experiences. Qualitative analysis of the stories supported our hypothesis. Our results contribute to avatar design in children's creative storytelling applications.more » « less
-
The expression of human emotion is integral to social interaction, and in virtual reality it is increasingly common to develop virtual avatars that attempt to convey emotions by mimicking these visual and aural cues, i.e. the facial and vocal expressions. However, errors in (or the absence of) facial tracking can result in the rendering of incorrect facial expressions on these virtual avatars. For example, a virtual avatar may speak with a happy or unhappy vocal inflection while their facial expression remains otherwise neutral. In circumstances where there is conflict between the avatar's facial and vocal expressions, it is possible that users will incorrectly interpret the avatar's emotion, which may have unintended consequences in terms of social influence or in terms of the outcome of the interaction. In this paper, we present a human-subjects study (N = 22) aimed at understanding the impact of conflicting facial and vocal emotional expressions. Specifically we explored three levels of emotional valence (unhappy, neutral, and happy) expressed in both visual (facial) and aural (vocal) forms. We also investigate three levels of head scales (down-scaled, accurate, and up-scaled) to evaluate whether head scale affects user interpretation of the conveyed emotion. We find significant effects of different multimodal expressions on happiness and trust perception, while no significant effect was observed for head scales. Evidence from our results suggest that facial expressions have a stronger impact than vocal expressions. Additionally, as the difference between the two expressions increase, the less predictable the multimodal expression becomes. For example, for the happy-looking and happy-sounding multimodal expression, we expect and see high happiness rating and high trust, however if one of the two expressions change, this mismatch makes the expression less predictable. We discuss the relationships, implications, and guidelines for social applications that aim to leverage multimodal social cues.more » « less
An official website of the United States government

