skip to main content

Title: Capturing human categorization of natural images by combining deep networks and cognitive models
Abstract Human categorization is one of the most important and successful targets of cognitive modeling, with decades of model development and assessment using simple, low-dimensional artificial stimuli. However, it remains unclear how these findings relate to categorization in more natural settings, involving complex, high-dimensional stimuli. Here, we take a step towards addressing this question by modeling human categorization over a large behavioral dataset, comprising more than 500,000 judgments over 10,000 natural images from ten object categories. We apply a range of machine learning methods to generate candidate representations for these images, and show that combining rich image representations with flexible cognitive models captures human decisions best. We also find that in the high-dimensional representational spaces these methods generate, simple prototype models can perform comparably to the more complex memory-based exemplar models dominant in laboratory settings.
; ;
Award ID(s):
Publication Date:
Journal Name:
Nature Communications
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The remarkable successes of convolutional neural networks (CNNs) in modern computer vision are by now well known, and they are increasingly being explored as computational models of the human visual system. In this paper, we ask whether CNNs might also provide a basis for modeling higher‐level cognition, focusing on the core phenomena of similarity and categorization. The most important advance comes from the ability of CNNs to learn high‐dimensional representations of complex naturalistic images, substantially extending the scope of traditional cognitive models that were previously only evaluated with simple artificial stimuli. In all cases, the most successful combinations arise when CNN representations are used with cognitive models that have the capacity to transform them to better fit human behavior. One consequence of these insights is a toolkit for the integration of cognitively motivated constraints back into CNN training paradigms in computer vision and machine learning, and we review cases where this leads to improved performance. A second consequence is a roadmap for how CNNs and cognitive models can be more fully integrated in the future, allowing for flexible end‐to‐end algorithms that can learn representations from data while still retaining the structured behavior characteristic of human cognition.

  2. Abstract

    Similarity is one of the most important relations humans perceive, arguably subserving category learning and categorization, generalization and discrimination, judgment and decision making, and other cognitive functions. Researchers have proposed a wide range of representations and metrics that could be at play in similarity judgment, yet have not comprehensively compared the power of these representations and metrics for predicting similarity within and across different semantic categories. We performed such a comparison by pairing nine prominent vector semantic representations with seven established similarity metrics that could operate on these representations, as well as supervised methods for dimensional weighting in the similarity function. This approach yields a factorial model structure with 126 distinct representation‐metric pairs, which we tested on a novel dataset of similarity judgments between pairs of cohyponymic words in eight categories. We found that cosine similarity and Pearson correlation were the overall best performing unweighted similarity functions, and that word vectors derived from free association norms often outperformed word vectors derived from text (including those specialized for similarity). Importantly, models that used human similarity judgments to learn category‐specific weights on dimensions yielded substantially better predictions than all unweighted approaches across all types of similarity functions and representations, although dimensionmore »weights did not generalize well across semantic categories, suggesting strong category context effects in similarity judgment. We discuss implications of these results for cognitive modeling and natural language processing, as well as for theories of the representations and metrics involved in similarity.

    « less
  3. Neuroimaging studies of human memory have consistently found that univariate responses in parietal cortex track episodic experience with stimuli (whether stimuli are 'old' or 'new'). More recently, pattern-based fMRI studies have shown that parietal cortex also carries information about the semantic content of remembered experiences. However, it is not well understood how memory-based and content-based signals are integrated within parietal cortex. Here, in humans (males and females), we used voxel-wise encoding models and a recognition memory task to predict the fMRI activity patterns evoked by complex natural scene images based on (1) the episodic history and (2) the semantic content of each image. Models were generated and compared across distinct subregions of parietal cortex and for occipitotemporal cortex. We show that parietal and occipitotemporal regions each encode memory and content information, but they differ in how they combine this information. Among parietal subregions, angular gyrus was characterized by robust and overlapping effects of memory and content. Moreover, subject-specific semantic tuning functions revealed that successful recognition shifted the amplitude of tuning functions in angular gyrus but did not change the selectivity of tuning. In other words, effects of memory and content were additive in angular gyrus. This pattern of data contrastedmore »with occipitotemporal cortex where memory and content effects were interactive: memory effects were preferentially expressed by voxels tuned to the content of a remembered image. Collectively, these findings provide unique insight into how parietal cortex combines information about episodic memory and semantic content.

    SIGNIFICANCE STATEMENTNeuroimaging studies of human memory have identified multiple brain regions that not only carry information about “whether” a visual stimulus is successfully recognized but also “what” the content of that stimulus includes. However, a fundamental and open question concerns how the brain integrates these two types of information (memory and content). Here, using a powerful combination of fMRI analysis methods, we show that parietal cortex, particularly the angular gyrus, robustly combines memory- and content-related information, but these two forms of information are represented via additive, independent signals. In contrast, memory effects in high-level visual cortex critically depend on (and interact with) content representations. Together, these findings reveal multiple and distinct ways in which the brain combines memory- and content-related information.

    « less
  4. Deep-learning methods can extract high-dimensional feature vectors for objects, concepts, images, and texts from large-scale digital data sets. These vectors are proxies for the mental representations that people use in everyday cognition and behavior. For this reason, they can serve as inputs into computational models of cognition, giving these models the ability to process and respond to naturalistic prompts. Over the past few years, researchers have applied this approach to topics such as similarity judgment, memory search, categorization, decision making, and conceptual knowledge. In this article, we summarize these applications, identify underlying trends, and outline directions for future research on the computational modeling of naturalistic cognition and behavior.

  5. Abstract

    Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features,more »mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.

    « less