Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Anterior regions of the ventral visual stream encode substantial information about object categories. Are top-down category-level forces critical for arriving at this representation, or can this representation be formed purely through domain-general learning of natural image structure? Here we present a fully self-supervised model which learns to represent individual images, rather than categories, such that views of the same image are embedded nearby in a low-dimensional feature space, distinctly from other recently encountered views. We find that category information implicitly emerges in the local similarity structure of this feature space. Further, these models learn hierarchical features which capture the structure of brain responses across the human ventral visual stream, on par with category-supervised models. These results provide computational support for a domain-general framework guiding the formation of visual representation, where the proximate goal is not explicitly about category information, but is instead to learn unique, compressed descriptions of the visual world.more » « less
-
Free, publicly-accessible full text available October 22, 2026
-
Deep neural network models provide a powerful experimental platform for exploring core mechanisms underlying human visual perception, such as perceptual grouping and contour integration—the process of linking local edge elements to arrive at a unified perceptual representation of a complete contour. Here, we demonstrate that feedforward convolutional neural networks (CNNs) fine-tuned on contour detection show this human-like capacity, but without relying on mechanisms proposed in prior work, such as lateral connections, recurrence, or top-down feedback. We identified two key properties needed for ImageNet pre-trained, feed-forward models to yield human-like contour integration: first, progressively increasing receptive field structure served as a critical architectural motif to support this capacity; and second, biased fine-tuning for contour-detection specifically for gradual curves (~20 degrees) resulted in human-like sensitivity to curvature. We further demonstrate that fine-tuning ImageNet pretrained models uncovers other hidden human-like capacities in feed-forward networks, including uncrowding (reduced interference from distractors as the number of distractors increases), which is considered a signature of human perceptual grouping. Thus, taken together these results provide a computational existence proof that purely feedforward hierarchical computations are capable of implementing gestalt “good continuation” and perceptual organization needed for human-like contour-integration and uncrowding. More broadly, these results raise the possibility that in human vision, later stages of processing play a more prominent role in perceptual-organization than implied by theories focused on recurrence and early lateral connections.more » « lessFree, publicly-accessible full text available August 18, 2026
-
Brain responses in visual cortex are typically modeled as a positively and negatively weighted sum of all features within a deep neural network (DNN) layer. However, this linear fit can dramatically alter a given feature space, making it unclear whether brain prediction levels stem more from the DNN itself, or from the flexibility of the encoding model. As such, studies of alignment may benefit from a paradigm shift toward more constrained and theoretically driven mapping methods. As a proof of concept, here we present a case study of face and scene selectivity, showing that typical encoding analyses do not differentiate between aligned and misaligned tuning bases in model-to-brain predictivity. We introduce a new alignment complexity measure -- tuning reorientation -- which favors DNNs that achieve high brain alignment via minimal distortion of the original feature space. We show that this measure helps arbitrate between models that are superficially equal in their predictivity, but which differ in alignment complexity. Our experiments broadly signal the benefit of sparse, positive-weighted encoding procedures, which directly enforce an analogy between the tuning directions of model and brain feature spaces.more » « less
-
According to the efficient coding hypothesis, neural populations encode information optimally when representations are high-dimensional and uncorrelated. However, such codes may carry a cost in terms of generalization and robustness. Past empirical studies of early visual cortex (V1) in rodents have suggested that this tradeoff indeed constrains sensory representations. However, it remains unclear whether these insights generalize across the hierarchy of the human visual system, and particularly to object representations in high-level occipitotemporal cortex (OTC). To gain new empirical clarity, here we develop a family of object recognition models with parametrically varying dropout proportion , which induces systematically varying dimensionality of internal responses (while controlling all other inductive biases). We find that increasing dropout produces an increasingly smooth, low-dimensional representational space. Optimal robustness to lesioning is observed at around 70% dropout, after which both accuracy and robustness decline. Representational comparison to large-scale 7T fMRI data from occipitotemporal cortex in the Natural Scenes Dataset reveals that this optimal degree of dropout is also associated with maximal emergent neural predictivity. Finally, using new techniques for achieving denoised estimates of the eigenspectrum of human fMRI responses, we compare the rate of eigenspectrum decay between model and brain feature spaces. We observe that the match between model and brain representations is associated with a common balance between efficiency and robustness in the representational space. These results suggest that varying dropout may reveal an optimal point of balance between the efficiency of high-dimensional codes and the robustness of low dimensional codes in hierarchical vision systems.more » « less
An official website of the United States government

Full Text Available