Abstract The remarkable successes of convolutional neural networks (CNNs) in modern computer vision are by now well known, and they are increasingly being explored as computational models of the human visual system. In this paper, we ask whether CNNs might also provide a basis for modeling higher‐level cognition, focusing on the core phenomena of similarity and categorization. The most important advance comes from the ability of CNNs to learn high‐dimensional representations of complex naturalistic images, substantially extending the scope of traditional cognitive models that were previously only evaluated with simple artificial stimuli. In all cases, the most successful combinations arise when CNN representations are used with cognitive models that have the capacity to transform them to better fit human behavior. One consequence of these insights is a toolkit for the integration of cognitively motivated constraints back into CNN training paradigms in computer vision and machine learning, and we review cases where this leads to improved performance. A second consequence is a roadmap for how CNNs and cognitive models can be more fully integrated in the future, allowing for flexible end‐to‐end algorithms that can learn representations from data while still retaining the structured behavior characteristic of human cognition.
more »
« less
Capturing human categorization of natural images by combining deep networks and cognitive models
Abstract Human categorization is one of the most important and successful targets of cognitive modeling, with decades of model development and assessment using simple, low-dimensional artificial stimuli. However, it remains unclear how these findings relate to categorization in more natural settings, involving complex, high-dimensional stimuli. Here, we take a step towards addressing this question by modeling human categorization over a large behavioral dataset, comprising more than 500,000 judgments over 10,000 natural images from ten object categories. We apply a range of machine learning methods to generate candidate representations for these images, and show that combining rich image representations with flexible cognitive models captures human decisions best. We also find that in the high-dimensional representational spaces these methods generate, simple prototype models can perform comparably to the more complex memory-based exemplar models dominant in laboratory settings.
more »
« less
- Award ID(s):
- 1932035
- PAR ID:
- 10309280
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 11
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep-learning methods can extract high-dimensional feature vectors for objects, concepts, images, and texts from large-scale digital data sets. These vectors are proxies for the mental representations that people use in everyday cognition and behavior. For this reason, they can serve as inputs into computational models of cognition, giving these models the ability to process and respond to naturalistic prompts. Over the past few years, researchers have applied this approach to topics such as similarity judgment, memory search, categorization, decision making, and conceptual knowledge. In this article, we summarize these applications, identify underlying trends, and outline directions for future research on the computational modeling of naturalistic cognition and behavior.more » « less
-
As the field of computational cognitive neuroscience continues to expand and generate new theories, there is a growing need for more advanced methods to test the hypothesis of brain-behavior relationships. Recent progress in Bayesian cognitive modeling has enabled the combination of neural and behavioral models into a single unifying framework. However, these approaches require manual feature extraction, and lack the capability to discover previously unknown neural features in more complex data. Consequently, this would hinder the expressiveness of the models. To address these challenges, we propose a Neurocognitive Variational Autoencoder (NCVA) to conjoin high-dimensional EEG with a cognitive model in both generative and predictive modeling analyses. Importantly, our NCVA enables both the prediction of EEG signals given behavioral data and the estimation of cognitive model parameters from EEG signals. This novel approach can allow for a more comprehensive understanding of the triplet relationship between behavior, brain activity, and cognitive processes.more » « less
-
Listeners track distributions of speech sounds along perceptual dimensions. We introduce a method for evaluating hypotheses about what those dimensions are, using a cognitive model whose prior distribution is estimated directly from speech recordings. We use this method to evaluate two speaker normalization algorithms against human data. Simulations show that representations that are normalized across speakers predict human discrimination data better than unnormalized representations, consistent with previous research. Results further reveal differences across normalization methods in how well each predicts human data. This work provides a framework for evaluating hypothesized representations of speech and lays the groundwork for testing models of speech perception on natural speech recordings from ecologically valid settings.more » « less
-
Fine-grained visual reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark fine-grained multi-agent categorization dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping fine-grained multi-agent categorization methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com/.</p>more » « less