Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance. The ways in which the internal face representations in DCNNs relate to human cognitive representations and brain activity are not well understood. Nearly all previous studies focused on static face image processing with rapid display times and ignored the processing of naturalistic, dynamic information. To address this gap, we developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces). We used this naturalistic dataset to compare representational geometries estimated from DCNNs, behavioral responses, and brain responses. We found that DCNN representational geometries were consistent across architectures, cognitive representational geometries were consistent across raters in a behavioral arrangement task, and neural representational geometries in face areas were consistent across brains. Representational geometries in late, fully connected DCNN layers, which are optimized for individuation, were much more weakly correlated with cognitive and neural geometries than were geometries in late-intermediate layers. The late-intermediate face-DCNN layers successfully matched cognitive representational geometries, as measured with a behavioral arrangement task that primarily reflected categorical attributes, and correlated with neural representational geometries in known face-selective topographies. Our study suggests that current DCNNs successfully capture neural cognitive processes for categorical attributes of faces but less accurately capture individuation and dynamic features. 
                        more » 
                        « less   
                    This content will become publicly available on December 15, 2025
                            
                            Comparing Representations in Static and Dynamic Vision Models to the Human Brain
                        
                    
    
            We compared neural responses to naturalistic videos and representations in deep network models trained with static and dynamic information. Models trained with dynamic information showed greater correspondence with neural representations in all brain regions, including those previously associated with the processing of static information. Among the models trained with dynamic information, those based on optic flow accounted for unique variance in neural responses that were not captured by Masked Autoencoders. This effect was strongest in ventral and dorsal brain regions, indicating that despite the Masked Autoencoders’ effectiveness at a variety of tasks, their representations diverge from representations in the human brain in the early stages of visual processing. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1943862
- PAR ID:
- 10575760
- Publisher / Repository:
- Openreview.net
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            According to a classical view of face perception (Bruce and Young, 1986; Haxby et al., 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li et al., 2019), and identity from lateral regions (Anzellotti and Caramazza, 2017). These findings could be reconciled with the classical view if regions specialized for one task (either identity or expression) contain a small amount of information for the other task (that enables above-chance decoding). In this case, we would expect representations in lateral regions to be more similar to representations in deep convolutional neural networks (DCNNs) trained to recognize facial expression than to representations in DCNNs trained to recognize face identity (the converse should hold for ventral regions). We tested this hypothesis by analyzing neural responses to faces varying in identity and expression. Representational dissimilarity matrices (RDMs) computed from human intracranial recordings (n= 11 adults; 7 females) were compared with RDMs from DCNNs trained to label either identity or expression. We found that RDMs from DCNNs trained to recognize identity correlated with intracranial recordings more strongly in all regions tested—even in regions classically hypothesized to be specialized for expression. These results deviate from the classical view, suggesting that face-selective ventral and lateral regions contribute to the representation of both identity and expression. SIGNIFICANCE STATEMENTPrevious work proposed that separate brain regions are specialized for the recognition of face identity and facial expression. However, identity and expression recognition mechanisms might share common brain regions instead. We tested these alternatives using deep neural networks and intracranial recordings from face-selective brain regions. Deep neural networks trained to recognize identity and networks trained to recognize expression learned representations that correlate with neural recordings. Identity-trained representations correlated with intracranial recordings more strongly in all regions tested, including regions hypothesized to be expression specialized in the classical hypothesis. These findings support the view that identity and expression recognition rely on common brain regions. This discovery may require reevaluation of the roles that the ventral and lateral neural pathways play in processing socially relevant stimuli.more » « less
- 
            Recent shifts in the understanding of how the mind and brain retain information in working memory (WM) call for revision to traditional theories. Evidence of dynamic, “activity-silent,” short-term retention processes diverges from conventional models positing that information is always retained in WM by sustained neural activity in buffers. Such evidence comes from machine-learning methods that can decode patterns of brain activity and the simultaneous administration of transcranial magnetic stimulation (TMS) to causally manipulate brain activity in specific areas and time points. TMS can “ping” brain areas to both reactivate latent representations retained in WM and affect memory performance. On the basis of these findings, I argue for a supplement to sustained retention mechanisms. Brain-decoding methods also reveal that dynamic levels of representational codes are retained in WM, and these vary according to task context, from perceptual (sensory) codes in posterior areas to abstract, recoded representations distributed across frontoparietal regions. A dynamic-processing model of WM is advanced to account for the overall pattern of results.more » « less
- 
            Recent neural evidence challenges the traditional view that face identity and facial expressions are processed by segregated neural pathways, showing that information about identity and expression are encoded within common brain regions. This article tests the hypothesis that integrated representations of identity and expression arise naturally within neural networks. Deep networks trained to recognize expression and deep networks trained to recognize identity spontaneously develop representations of identity and expression, respectively. These findings serve as a “proof-of-concept” that it is not necessary to discard task-irrelevant information for identity and expression recognition.more » « less
- 
            Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraining. However, we observe that the current masked autoencoder models lack good generalization ability on graph data. To tackle this issue, we propose a novel graph masked autoencoder framework called GiGaMAE. Different from existing masked autoencoders that learn node presentations by explicitly reconstructing the original graph components (e.g., features or edges), in this paper, we propose to collaboratively reconstruct informative and integrated latent embeddings. By considering embeddings encompassing graph topology and attribute information as reconstruction targets, our model could capture more generalized and comprehensive knowledge. Furthermore, we introduce a mutual information based reconstruction loss that enables the effective reconstruction of multiple targets. This learning objective allows us to differentiate between the exclusive knowledge learned from a single target and common knowledge shared by multiple targets. We evaluate our method on three downstream tasks with seven datasets as benchmarks. Extensive experiments demonstrate the superiority of GiGaMAE against state-of-the-art baselines. We hope our results will shed light on the design of foundation models on graph-structured data. Our code is available at: https://github.com/sycny/GiGaMAE.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
