skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Dynamic predictive coding: A model of hierarchical sequence learning and prediction in the neocortex
We introduce dynamic predictive coding, a hierarchical model of spatiotemporal prediction and sequence learning in the neocortex. The model assumes that higher cortical levels modulate the temporal dynamics of lower levels, correcting their predictions of dynamics using prediction errors. As a result, lower levels form representations that encode sequences at shorter timescales (e.g., a single step) while higher levels form representations that encode sequences at longer timescales (e.g., an entire sequence). We tested this model using a two-level neural network, where the top-down modulation creates low-dimensional combinations of a set of learned temporal dynamics to explain input sequences. When trained on natural videos, the lower-level model neurons developed space-time receptive fields similar to those of simple cells in the primary visual cortex while the higher-level responses spanned longer timescales, mimicking temporal response hierarchies in the cortex. Additionally, the network’s hierarchical sequence representation exhibited both predictive and postdictive effects resembling those observed in visual motion processing in humans (e.g., in the flash-lag illusion). When coupled with an associative memory emulating the role of the hippocampus, the model allowed episodic memories to be stored and retrieved, supporting cue-triggered recall of an input sequence similar to activity recall in the visual cortex. When extended to three hierarchical levels, the model learned progressively more abstract temporal representations along the hierarchy. Taken together, our results suggest that cortical processing and learning of sequences can be interpreted as dynamic predictive coding based on a hierarchical spatiotemporal generative model of the visual world.  more » « less
Award ID(s):
2223495
PAR ID:
10559114
Author(s) / Creator(s):
;
Editor(s):
Rubin, Jonathan
Publisher / Repository:
Plos
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
20
Issue:
2
ISSN:
1553-7358
Page Range / eLocation ID:
e1011801
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We describe a sparse coding model of visual cortex that encodes image transformations in an equivariant and hierarchical manner. The model consists of a group-equivariant convolutional layer with internal recurrent connections that implement sparse coding through neural population attractor dynamics, consistent with the architecture of visual cortex. The layers can be stacked hierarchically by introducing recurrent connections between them. The hierarchical structure enables rich bottom-up and top-down information flows, hypothesized to underlie the visual system’s ability for perceptual inference. The model’s equivariant representations are demonstrated on time-varying visual scenes. 
    more » « less
  2. Our brains are, “prediction machines”, where we are continuously comparing our surroundings with predictions from internal models generated by our brains. This is demonstrated by observing our basic low level sensory systems and how they predict environmental changes as we move through space and time. Indeed, even at higher cognitive levels, we are able to do prediction. We can predict how the laws of physics affect people, places, and things and even predict the end of someone’s sentence. In our work, we sought to create an artificial model that is able to mimic early, low level biological predictive behavior in a computer vision system. Our predictive vision model uses spatiotemporal sequence memories learned from deep sparse coding. This model is implemented using a biologically inspired architecture: one that utilizes sequence memories, lateral inhibition, and top-down feed- back in a generative framework. Our model learns the causes of the data in a completely unsupervised manner, by simply observing and learning about the world. Spatiotemporal features are learned by minimizing a reconstruction error convolved over space and time, and can subsequently be used for recognition, classification, and future video prediction. Our experiments show that we are able to accurately predict what will happen in the future; furthermore, we can use our predictions to detect anomalous, unexpected events in both synthetic and real video sequences. 
    more » « less
  3. Identifying an object and distinguishing it from similar items depends upon the ability to perceive its component parts as conjoined into a cohesive whole, but the brain mechanisms underlying this ability remain elusive. The ventral visual processing pathway in primates is organized hierarchically: Neuronal responses in early stages are sensitive to the manipulation of simple visual features, whereas neuronal responses in subsequent stages are tuned to increasingly complex stimulus attributes. It is widely assumed that feature-coding dominates in early visual cortex whereas later visual regions employ conjunction-coding in which object representations are different from the sum of their simple feature parts. However, no study in humans has demonstrated that putative object-level codes in higher visual cortex cannot be accounted for by feature-coding and that putative feature codes in regions prior to ventral temporal cortex are not equally well characterized as object-level codes. Thus the existence of a transition from feature- to conjunction-coding in human visual cortex remains unconfirmed, and if a transition does occur its location remains unknown. By employing multivariate analysis of functional imaging data, we measure both feature-coding and conjunction-coding directly, using the same set of visual stimuli, and pit them against each other to reveal the relative dominance of one vs. the other throughout cortex. Our results reveal a transition from feature-coding in early visual cortex to conjunction-coding in both inferior temporal and posterior parietal cortices. This novel method enables the use of experimentally controlled stimulus features to investigate population-level feature and conjunction codes throughout human cortex. NEW & NOTEWORTHY We use a novel analysis of neuroimaging data to assess representations throughout visual cortex, revealing a transition from feature-coding to conjunction-coding along both ventral and dorsal pathways. Occipital cortex contains more information about spatial frequency and contour than about conjunctions of those features, whereas inferotemporal and parietal cortices contain conjunction coding sites in which there is more information about the whole stimulus than its component parts. 
    more » « less
  4. Abstract Cortical representations supporting many cognitive abilities emerge from underlying circuits comprised of several different cell types. However, cell type-specific contributions to rate and timing-based cortical coding are not well-understood. Here, we investigated the role of parvalbumin neurons in cortical complex scene analysis. Many complex scenes contain sensory stimuli which are highly dynamic in time and compete with stimuli at other spatial locations. Parvalbumin neurons play a fundamental role in balancing excitation and inhibition in cortex and sculpting cortical temporal dynamics; yet their specific role in encoding complex scenes via timing-based coding, and the robustness of temporal representations to spatial competition, has not been investigated. Here, we address these questions in auditory cortex of mice using a cocktail party-like paradigm, integrating electrophysiology, optogenetic manipulations, and a family of spike-distance metrics, to dissect parvalbumin neurons’ contributions towards rate and timing-based coding. We find that suppressing parvalbumin neurons degrades cortical discrimination of dynamic sounds in a cocktail party-like setting via changes in rapid temporal modulations in rate and spike timing, and over a wide range of time-scales. Our findings suggest that parvalbumin neurons play a critical role in enhancing cortical temporal coding and reducing cortical noise, thereby improving representations of dynamic stimuli in complex scenes. 
    more » « less
  5. According to the efficient coding hypothesis, neural populations encode information optimally when representations are high-dimensional and uncorrelated. However, such codes may carry a cost in terms of generalization and robustness. Past empirical studies of early visual cortex (V1) in rodents have suggested that this tradeoff indeed constrains sensory representations. However, it remains unclear whether these insights generalize across the hierarchy of the human visual system, and particularly to object representations in high-level occipitotemporal cortex (OTC). To gain new empirical clarity, here we develop a family of object recognition models with parametrically varying dropout proportion , which induces systematically varying dimensionality of internal responses (while controlling all other inductive biases). We find that increasing dropout produces an increasingly smooth, low-dimensional representational space. Optimal robustness to lesioning is observed at around 70% dropout, after which both accuracy and robustness decline. Representational comparison to large-scale 7T fMRI data from occipitotemporal cortex in the Natural Scenes Dataset reveals that this optimal degree of dropout is also associated with maximal emergent neural predictivity. Finally, using new techniques for achieving denoised estimates of the eigenspectrum of human fMRI responses, we compare the rate of eigenspectrum decay between model and brain feature spaces. We observe that the match between model and brain representations is associated with a common balance between efficiency and robustness in the representational space. These results suggest that varying dropout may reveal an optimal point of balance between the efficiency of high-dimensional codes and the robustness of low dimensional codes in hierarchical vision systems. 
    more » « less