skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Generative decoding of visual stimuli
Reconstructing natural images from fMRI recordings is a challenging task of great importance in neuroscience. The current architectures are bottlenecked because they fail to effectively capture the hierarchical processing of visual stimuli that takes place in the human brain. Motivated by that fact, we introduce a novel neural network architecture for the problem of neural decoding. Our architecture uses Hierarchical Variational Autoencoders (HVAEs) to learn meaningful representations of natural images and leverages their latent space hierarchy to learn voxel-to-image mappings. By mapping the early stages of the visual pathway to the first set of latent variables and the higher visual cortex areas to the deeper layers in the latent hierarchy, we are able to construct a latent variable neural decoding model that replicates the hierarchical visual information processing. Our model achieves better reconstructions compared to the state of the art and our ablation study indicates that the hierarchical structure of the latent space is responsible for that performance.  more » « less
Award ID(s):
1932620
PAR ID:
10481599
Author(s) / Creator(s):
Editor(s):
Andreas Krause, Barbara Engelhardt
Publisher / Repository:
https://proceedings.mlr.press/v202/miliotou23a.html
Date Published:
Journal Name:
Proceedings of the 40th International Conference on Machine Learning
Edition / Version:
1
Volume:
1
Issue:
1
Subject(s) / Keyword(s):
Decoding of Visual Stimuli, Hierarchical Variational Autoencoders (HVAEs)
Format(s):
Medium: X Size: 2MB Other: 1
Size(s):
2MB
Location:
https://proceedings.mlr.press/v202/miliotou23a.html
Sponsoring Org:
National Science Foundation
More Like this
  1. Rubin, Jonathan (Ed.)
    We introduce dynamic predictive coding, a hierarchical model of spatiotemporal prediction and sequence learning in the neocortex. The model assumes that higher cortical levels modulate the temporal dynamics of lower levels, correcting their predictions of dynamics using prediction errors. As a result, lower levels form representations that encode sequences at shorter timescales (e.g., a single step) while higher levels form representations that encode sequences at longer timescales (e.g., an entire sequence). We tested this model using a two-level neural network, where the top-down modulation creates low-dimensional combinations of a set of learned temporal dynamics to explain input sequences. When trained on natural videos, the lower-level model neurons developed space-time receptive fields similar to those of simple cells in the primary visual cortex while the higher-level responses spanned longer timescales, mimicking temporal response hierarchies in the cortex. Additionally, the network’s hierarchical sequence representation exhibited both predictive and postdictive effects resembling those observed in visual motion processing in humans (e.g., in the flash-lag illusion). When coupled with an associative memory emulating the role of the hippocampus, the model allowed episodic memories to be stored and retrieved, supporting cue-triggered recall of an input sequence similar to activity recall in the visual cortex. When extended to three hierarchical levels, the model learned progressively more abstract temporal representations along the hierarchy. Taken together, our results suggest that cortical processing and learning of sequences can be interpreted as dynamic predictive coding based on a hierarchical spatiotemporal generative model of the visual world. 
    more » « less
  2. We propose a hierarchical neural network architecture for unsupervised learning of equivariant part-whole decompositions of visual scenes. In contrast to the global equivariance of group-equivariant networks, the proposed architecture exhibits equivariance to part-whole transformations throughout the hierarchy, which we term hierarchical equivariance. The model achieves these structured internal representations via hierarchical Bayesian inference, which gives rise to rich bottom-up, top-down, and lateral information flows, hypothesized to underlie the mechanisms of perceptual inference in visual cortex. We demonstrate these useful properties of the model on a simple dataset of scenes with multiple objects under independent rotations and translations. 
    more » « less
  3. This paper studies the fundamental problem of multi-layer generator models in learning hierarchical representations. The multi-layer generator model that consists of multiple layers of latent variables organized in a top-down architecture tends to learn multiple levels of data abstraction. However, such multi-layer latent variables are typically parameterized to be Gaussian, which can be less informative in capturing complex abstractions, resulting in limited success in hierarchical representation learning. On the other hand, the energy-based (EBM) prior is known to be expressive in capturing the data regularities, but it often lacks the hierarchical structure to capture different levels of hierarchical representations. In this paper, we propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning. We develop a variational joint learning scheme that seamlessly integrates an inference model for efficient inference. Our experiments demonstrate that the proposed joint EBM prior is effective and expressive in capturing hierarchical representations and modeling data distribution. 
    more » « less
  4. In models of visual spatial attention control, it is commonly held that top–down control signals originate in the dorsal attention network, propagating to the visual cortex to modulate baseline neural activity and bias sensory processing. However, the precise distribution of these top–down influences across different levels of the visual hierarchy is debated. In addition, it is unclear whether these baseline neural activity changes translate into improved performance. We analyzed attention-related baseline activity during the anticipatory period of a voluntary spatial attention task, using two independent functional magnetic resonance imaging datasets and two analytic approaches. First, as in prior studies, univariate analysis showed that covert attention significantly enhanced baseline neural activity in higher-order visual areas contralateral to the attended visual hemifield, while effects in lower-order visual areas (e.g., V1) were weaker and more variable. Second, in contrast, multivariate pattern analysis (MVPA) revealed significant decoding of attention conditions across all visual cortical areas, with lower-order visual areas exhibiting higher decoding accuracies than higher-order areas. Third, decoding accuracy, rather than the magnitude of univariate activation, was a better predictor of a subject's stimulus discrimination performance. Finally, the MVPA results were replicated across two experimental conditions, where the direction of spatial attention was either externally instructed by a cue or based on the participants’ free choice decision about where to attend. Together, these findings offer new insights into the extent of attentional biases in the visual hierarchy under top–down control and how these biases influence both sensory processing and behavioral performance. 
    more » « less
  5. Abbott, Derek (Ed.)
    Abstract Human vision, thought, and planning involve parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using neural networks, but a generative model formulation has been lacking. Generative models that leverage compositionality, recursion, and part-whole hierarchies are thought to underlie human concept learning and the ability to construct and represent flexible mental concepts. We introduce Recursive Neural Programs (RNPs), a neural generative model that addresses the part-whole hierarchy learning problem by modeling images as hierarchical trees of probabilistic sensory-motor programs. These programs recursively reuse learned sensory-motor primitives to model an image within different spatial reference frames, enabling hierarchical composition of objects from parts and implementing a grammar for images. We show that RNPs can learn part-whole hierarchies for a variety of image datasets, allowing rich compositionality and intuitive parts-based explanations of objects. Our model also suggests a cognitive framework for understanding how human brains can potentially learn and represent concepts in terms of recursively defined primitives and their relations with each other. 
    more » « less