The remarkable successes of convolutional neural networks (CNNs) in modern computer vision are by now well known, and they are increasingly being explored as computational models of the human visual system. In this paper, we ask whether CNNs might also provide a basis for modeling higher‐level cognition, focusing on the core phenomena of similarity and categorization. The most important advance comes from the ability of CNNs to learn high‐dimensional representations of complex naturalistic images, substantially extending the scope of traditional cognitive models that were previously only evaluated with simple artificial stimuli. In all cases, the most successful combinations arise when CNN representations are used with cognitive models that have the capacity to transform them to better fit human behavior. One consequence of these insights is a toolkit for the integration of cognitively motivated constraints back into CNN training paradigms in computer vision and machine learning, and we review cases where this leads to improved performance. A second consequence is a roadmap for how CNNs and cognitive models can be more fully integrated in the future, allowing for flexible end‐to‐end algorithms that can learn representations from data while still retaining the structured behavior characteristic of human cognition.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Nature Communications
- Sponsoring Org:
- National Science Foundation
More Like this
Similarity is one of the most important relations humans perceive, arguably subserving category learning and categorization, generalization and discrimination, judgment and decision making, and other cognitive functions. Researchers have proposed a wide range of representations and metrics that could be at play in similarity judgment, yet have not comprehensively compared the power of these representations and metrics for predicting similarity within and across different semantic categories. We performed such a comparison by pairing nine prominent vector semantic representations with seven established similarity metrics that could operate on these representations, as well as supervised methods for dimensional weighting in the similarity function. This approach yields a factorial model structure with 126 distinct representation‐metric pairs, which we tested on a novel dataset of similarity judgments between pairs of cohyponymic words in eight categories. We found that cosine similarity and Pearson correlation were the overall best performing unweighted similarity functions, and that word vectors derived from free association norms often outperformed word vectors derived from text (including those specialized for similarity). Importantly, models that used human similarity judgments to learn category‐specific weights on dimensions yielded substantially better predictions than all unweighted approaches across all types of similarity functions and representations, although dimensionmore »
Neuroimaging studies of human memory have consistently found that univariate responses in parietal cortex track episodic experience with stimuli (whether stimuli are 'old' or 'new'). More recently, pattern-based fMRI studies have shown that parietal cortex also carries information about the semantic content of remembered experiences. However, it is not well understood how memory-based and content-based signals are integrated within parietal cortex. Here, in humans (males and females), we used voxel-wise encoding models and a recognition memory task to predict the fMRI activity patterns evoked by complex natural scene images based on (1) the episodic history and (2) the semantic content of each image. Models were generated and compared across distinct subregions of parietal cortex and for occipitotemporal cortex. We show that parietal and occipitotemporal regions each encode memory and content information, but they differ in how they combine this information. Among parietal subregions, angular gyrus was characterized by robust and overlapping effects of memory and content. Moreover, subject-specific semantic tuning functions revealed that successful recognition shifted the amplitude of tuning functions in angular gyrus but did not change the selectivity of tuning. In other words, effects of memory and content were additive in angular gyrus. This pattern of data contrastedmore »
SIGNIFICANCE STATEMENTNeuroimaging studies of human memory have identified multiple brain regions that not only carry information about “whether” a visual stimulus is successfully recognized but also “what” the content of that stimulus includes. However, a fundamental and open question concerns how the brain integrates these two types of information (memory and content). Here, using a powerful combination of fMRI analysis methods, we show that parietal cortex, particularly the angular gyrus, robustly combines memory- and content-related information, but these two forms of information are represented via additive, independent signals. In contrast, memory effects in high-level visual cortex critically depend on (and interact with) content representations. Together, these findings reveal multiple and distinct ways in which the brain combines memory- and content-related information.
Deep-learning methods can extract high-dimensional feature vectors for objects, concepts, images, and texts from large-scale digital data sets. These vectors are proxies for the mental representations that people use in everyday cognition and behavior. For this reason, they can serve as inputs into computational models of cognition, giving these models the ability to process and respond to naturalistic prompts. Over the past few years, researchers have applied this approach to topics such as similarity judgment, memory search, categorization, decision making, and conceptual knowledge. In this article, we summarize these applications, identify underlying trends, and outline directions for future research on the computational modeling of naturalistic cognition and behavior.
Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features,more »