skip to main content


Title: Encoding of facial features by single neurons in the human amygdala and hippocampus
Abstract

Faces are salient social stimuli that attract a stereotypical pattern of eye movement. The human amygdala and hippocampus are involved in various aspects of face processing; however, it remains unclear how they encode the content of fixations when viewing faces. To answer this question, we employed single-neuron recordings with simultaneous eye tracking when participants viewed natural face stimuli. We found a class of neurons in the human amygdala and hippocampus that encoded salient facial features such as the eyes and mouth. With a control experiment using non-face stimuli, we further showed that feature selectivity was specific to faces. We also found another population of neurons that differentiated saccades to the eyes vs. the mouth. Population decoding confirmed our results and further revealed the temporal dynamics of face feature coding. Interestingly, we found that the amygdala and hippocampus played different roles in encoding facial features. Lastly, we revealed two functional roles of feature-selective neurons: 1) they encoded the salient region for face recognition, and 2) they were related to perceived social trait judgments. Together, our results link eye movement with neural face processing and provide important mechanistic insights for human face perception.

 
more » « less
Award ID(s):
2114644
NSF-PAR ID:
10383774
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Communications Biology
Volume:
4
Issue:
1
ISSN:
2399-3642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Visual attention to facial features is an important way that group-living primate species gain knowledge about others. However, where this attention is focused on the face is influenced by contextual and social features, and emerging evidence in Pan species suggests that oxytocin, a hormone involved in forming and maintaining affiliative bonds among members of the same group, influences social attention as measured by eye gaze. Specifically, bonobos tend to focus on conspecifics’ eyes when viewing two-dimensional images, whereas chimpanzees focus more on the edges of the face. Moreover, exogenous oxytocin, which was hypothesized to increase eye contact in both species, instead enhanced this existing difference. We follow up on this to (1) determine the degree to which this Pan pattern generalizes across highly social, cooperative non-ape primates and (2) explore the impact of exogenously administered vs. endogenously released oxytocin in impacting this behavior. To do so, we tracked gaze direction on a computerized social categorization task using conspecific faces in tufted capuchin monkeys ( Sapajus [Cebus] apella ) after (1) exogenously administering intranasal oxytocin using a nebulizer or (2) inducing an endogenous increase in oxytocin using fur-rubbing, previously validated to increase oxytocin in capuchins. Overall, we did not find a general tendency in the capuchins to look toward the eyes or mouth, but we found that oxytocin was related to looking behavior toward these regions, albeit not in a straightforward way. Considering frequency of looking per trial, monkeys were more likely to look at the eye region in the fur-rubbing condition as compared to either the saline or exogenous oxytocin conditions. However, in terms of duration of looking during trials in which they did look at the eye region, monkeys spent significantly less time looking at the eyes in both oxytocin conditions as compared to the saline condition. These results suggest that oxytocin did not necessarily enhance eye looking in capuchins, which is consistent with the results from Pan species, and that endogenous and exogenous oxytocin may behave differently in their effect on how social attention is allocated. 
    more » « less
  2. Abstract

    A central challenge in face perception research is to understand how neurons encode face identities. This challenge has not been met largely due to the lack of simultaneous access to the entire face processing neural network and the lack of a comprehensive multifaceted model capable of characterizing a large number of facial features. Here, we addressed this challenge by conducting in silico experiments using a pre-trained face recognition deep neural network (DNN) with a diverse array of stimuli. We identified a subset of DNN units selective to face identities, and these identity-selective units demonstrated generalized discriminability to novel faces. Visualization and manipulation of the network revealed the importance of identity-selective units in face recognition. Importantly, using our monkey and human single-neuron recordings, we directly compared the response of artificial units with real primate neurons to the same stimuli and found that artificial units shared a similar representation of facial features as primate neurons. We also observed a region-based feature coding mechanism in DNN units as in human neurons. Together, by directly linking between artificial and primate neural systems, our results shed light on how the primate brain performs face recognition tasks.

     
    more » « less
  3. null (Ed.)
    Abstract An important question in human face perception research is to understand whether the neural representation of faces is dynamically modulated by context. In particular, although there is a plethora of neuroimaging literature that has probed the neural representation of faces, few studies have investigated what low-level structural and textural facial features parametrically drive neural responses to faces and whether the representation of these features is modulated by the task. To answer these questions, we employed 2 task instructions when participants viewed the same faces. We first identified brain regions that parametrically encoded high-level social traits such as perceived facial trustworthiness and dominance, and we showed that these brain regions were modulated by task instructions. We then employed a data-driven computational face model with parametrically generated faces and identified brain regions that encoded low-level variation in the faces (shape and skin texture) that drove neural responses. We further analyzed the evolution of the neural feature vectors along the visual processing stream and visualized and explained these feature vectors. Together, our results showed a flexible neural representation of faces for both low-level features and high-level social traits in the human brain. 
    more » « less
  4. Abstract

    Humans detect faces efficiently from a young age. Face detection is critical for infants to identify and learn from relevant social stimuli in their environments. Faces with eye contact are an especially salient stimulus, and attention to the eyes in infancy is linked to the emergence of later sociality. Despite the importance of both of these early social skills—attending to faces and attending to the eyes—surprisingly little is known about how they interact. We used eye tracking to explore whether eye contact influences infants' face detection. Longitudinally, we examined 2‐, 4‐, and 6‐month‐olds' (N = 65) visual scanning of complex image arrays with human and animal faces varying in eye contact and head orientation. Across all ages, infants displayed superior detection of faces with eye contact; however, this effect varied as a function of species and head orientation. Infants were more attentive to human than animal faces and were more sensitive to eye and head orientation for human faces compared to animal faces. Unexpectedly, human faces with both averted heads and eyes received the most attention. This pattern may reflect the early emergence of gaze following—the ability to look where another individual looks—which begins to develop around this age. Infants may be especially interested in averted gaze faces, providing early scaffolding for joint attention. This study represents the first investigation to document infants' attention patterns to faces systematically varying in their attentional states. Together, these findings suggest that infants develop early, specialized functional conspecific face detection.

     
    more » « less
  5. The PoseASL dataset consists of color and depth videos collected from ASL signers at the Linguistic and Assistive Technologies Laboratory under the direction of Matt Huenerfauth, as part of a collaborative research project with researchers at the Rochester Institute of Technology, Boston University, and the University of Pennsylvania. Access: After becoming an authorized user of Databrary, please contact Matt Huenerfauth if you have difficulty accessing this volume. We have collected a new dataset consisting of color and depth videos of fluent American Sign Language signers performing sequences ASL signs and sentences. Given interest among sign-recognition and other computer-vision researchers in red-green-blue-depth (RBGD) video, we release this dataset for use by the research community. In addition to the video files, we share depth data files from a Kinect v2 sensor, as well as additional motion-tracking files produced through post-processing of this data. Organization of the Dataset: The dataset is organized into sub-folders, with codenames such as "P01" or "P16" etc. These codenames refer to specific human signers who were recorded in this dataset. Please note that there was no participant P11 nor P14; those numbers were accidentally skipped during the process of making appointments to collect video stimuli. Task: During the recording session, the participant was met by a member of our research team who was a native ASL signer. No other individuals were present during the data collection session. After signing the informed consent and video release document, participants responded to a demographic questionnaire. Next, the data-collection session consisted of English word stimuli and cartoon videos. The recording session began with showing participants stimuli consisting of slides that displayed English word and photos of items, and participants were asked to produce the sign for each (PDF included in materials subfolder). Next, participants viewed three videos of short animated cartoons, which they were asked to recount in ASL: - Canary Row, Warner Brothers Merrie Melodies 1950 (the 7-minute video divided into seven parts) - Mr. Koumal Flies Like a Bird, Studio Animovaneho Filmu 1969 - Mr. Koumal Battles his Conscience, Studio Animovaneho Filmu 1971 The word list and cartoons were selected as they are identical to the stimuli used in the collection of the Nicaraguan Sign Language video corpora - see: Senghas, A. (1995). Children’s Contribution to the Birth of Nicaraguan Sign Language. Doctoral dissertation, Department of Brain and Cognitive Sciences, MIT. Demographics: All 14 of our participants were fluent ASL signers. As screening, we asked our participants: Did you use ASL at home growing up, or did you attend a school as a very young child where you used ASL? All the participants responded affirmatively to this question. A total of 14 DHH participants were recruited on the Rochester Institute of Technology campus. Participants included 7 men and 7 women, aged 21 to 35 (median = 23.5). All of our participants reported that they began using ASL when they were 5 years old or younger, with 8 reporting ASL use since birth, and 3 others reporting ASL use since age 18 months. Filetypes: *.avi, *_dep.bin: The PoseASL dataset has been captured by using a Kinect 2.0 RGBD camera. The output of this camera system includes multiple channels which include RGB, depth, skeleton joints (25 joints for every video frame), and HD face (1,347 points). The video resolution produced in 1920 x 1080 pixels for the RGB channel and 512 x 424 pixels for the depth channels respectively. Due to limitations in the acceptable filetypes for sharing on Databrary, it was not permitted to share binary *_dep.bin files directly produced by the Kinect v2 camera system on the Databrary platform. If your research requires the original binary *_dep.bin files, then please contact Matt Huenerfauth. *_face.txt, *_HDface.txt, *_skl.txt: To make it easier for future researchers to make use of this dataset, we have also performed some post-processing of the Kinect data. To extract the skeleton coordinates of the RGB videos, we used the Openpose system, which is capable of detecting body, hand, facial, and foot keypoints of multiple people on single images in real time. The output of Openpose includes estimation of 70 keypoints for the face including eyes, eyebrows, nose, mouth and face contour. The software also estimates 21 keypoints for each of the hands (Simon et al, 2017), including 3 keypoints for each finger, as shown in Figure 2. Additionally, there are 25 keypoints estimated for the body pose (and feet) (Cao et al, 2017; Wei et al, 2016). Reporting Bugs or Errors: Please contact Matt Huenerfauth to report any bugs or errors that you identify in the corpus. We appreciate your help in improving the quality of the corpus over time by identifying any errors. Acknowledgement: This material is based upon work supported by the National Science Foundation under award 1749376: "Collaborative Research: Multimethod Investigation of Articulatory and Perceptual Constraints on Natural Language Evolution." 
    more » « less