skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Spatiotemporal Sequence Memory for Prediction using Deep Sparse Coding
Our brains are, “prediction machines”, where we are continuously comparing our surroundings with predictions from internal models generated by our brains. This is demonstrated by observing our basic low level sensory systems and how they predict environmental changes as we move through space and time. Indeed, even at higher cognitive levels, we are able to do prediction. We can predict how the laws of physics affect people, places, and things and even predict the end of someone’s sentence. In our work, we sought to create an artificial model that is able to mimic early, low level biological predictive behavior in a computer vision system. Our predictive vision model uses spatiotemporal sequence memories learned from deep sparse coding. This model is implemented using a biologically inspired architecture: one that utilizes sequence memories, lateral inhibition, and top-down feed- back in a generative framework. Our model learns the causes of the data in a completely unsupervised manner, by simply observing and learning about the world. Spatiotemporal features are learned by minimizing a reconstruction error convolved over space and time, and can subsequently be used for recognition, classification, and future video prediction. Our experiments show that we are able to accurately predict what will happen in the future; furthermore, we can use our predictions to detect anomalous, unexpected events in both synthetic and real video sequences.  more » « less
Award ID(s):
1846023 1954364
PAR ID:
10157219
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
NICE '19: Proceedings of the 7th Annual Neuro-inspired Computational Elements Workshop
Page Range / eLocation ID:
1 to 7
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Rubin, Jonathan (Ed.)
    We introduce dynamic predictive coding, a hierarchical model of spatiotemporal prediction and sequence learning in the neocortex. The model assumes that higher cortical levels modulate the temporal dynamics of lower levels, correcting their predictions of dynamics using prediction errors. As a result, lower levels form representations that encode sequences at shorter timescales (e.g., a single step) while higher levels form representations that encode sequences at longer timescales (e.g., an entire sequence). We tested this model using a two-level neural network, where the top-down modulation creates low-dimensional combinations of a set of learned temporal dynamics to explain input sequences. When trained on natural videos, the lower-level model neurons developed space-time receptive fields similar to those of simple cells in the primary visual cortex while the higher-level responses spanned longer timescales, mimicking temporal response hierarchies in the cortex. Additionally, the network’s hierarchical sequence representation exhibited both predictive and postdictive effects resembling those observed in visual motion processing in humans (e.g., in the flash-lag illusion). When coupled with an associative memory emulating the role of the hippocampus, the model allowed episodic memories to be stored and retrieved, supporting cue-triggered recall of an input sequence similar to activity recall in the visual cortex. When extended to three hierarchical levels, the model learned progressively more abstract temporal representations along the hierarchy. Taken together, our results suggest that cortical processing and learning of sequences can be interpreted as dynamic predictive coding based on a hierarchical spatiotemporal generative model of the visual world. 
    more » « less
  2. Sequential memory, the ability to form and accurately recall a sequence of events or stimuli in the correct order, is a fundamental prerequisite for biological and artificial intelligence as it underpins numerous cognitive functions (e.g., language comprehension, planning, episodic memory formation, etc.) However, existing methods of sequential memory suffer from catastrophic forgetting, limited capacity, slow iterative learning procedures, low-order Markov memory, and, most importantly, the inability to represent and generate multiple valid future possibilities stemming from the same context. Inspired by biologically plausible neuroscience theories of cognition, we propose Predictive Attractor Models (PAM), a novel sequence memory architecture with desirable generative properties. PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input only once. Additionally, we find that PAM avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns, which prevents new memories from overwriting previously learned knowledge. PAM generates future predictions by sampling from a union set of predicted possibilities; this generative ability is realized through an attractor model trained alongside the predictor. We show that PAM is trained with local computations through Hebbian plasticity rules in a biologically plausible framework. Other desirable traits (e.g., noise tolerance, CPU-based learning, capacity scaling) are discussed throughout the paper. Our findings suggest that PAM represents a significant step forward in the pursuit of biologically plausible and computationally efficient sequential memory models, with broad implications for cognitive science and artificial intelligence research. Illustration videos and code are available on our project page: https://ramymounir.com/publications/pam. 
    more » « less
  3. Memories are an important part of how we think, understand the world around us, and plan out future actions. In the brain, memories are thought to be stored in a region called the hippocampus. When memories are formed, neurons store events that occur around the same time together. This might explain why often, in the brains of animals, the activity associated with retrieving memories is not just a snapshot of what happened at a specific moment-- it can also include information about what the animal might experience next. This can have a clear utility if animals use memories to predict what they might experience next and plan out future actions. Mathematically, this notion of predictiveness can be summarized by an algorithm known as the successor representation. This algorithm describes what the activity of neurons in the hippocampus looks like when retrieving memories and making predictions based on them. However, even though the successor representation can computationally reproduce the activity seen in the hippocampus when it is making predictions, it is unclear what biological mechanisms underpin this computation in the brain. Fang et al. approached this problem by trying to build a model that could generate the same activity patterns computed by the successor representation using only biological mechanisms known to exist in the hippocampus. First, they used computational methods to design a network of neurons that had the biological properties of neural networks in the hippocampus. They then used the network to simulate neural activity. The results show that the activity of the network they designed was able to exactly match the successor representation. Additionally, the data resulting from the simulated activity in the network fitted experimental observations of hippocampal activity in Tufted Titmice. One advantage of the network designed by Fang et al. is that it can generate predictions in flexible ways,. That is, it canmake both short and long-term predictions from what an individual is experiencing at the moment. This flexibility means that the network can be used to simulate how the hippocampus learns in a variety of cognitive tasks. Additionally, the network is robust to different conditions. Given that the brain has to be able to store memories in many different situations, this is a promising indication that this network may be a reasonable model of how the brain learns. The results of Fang et al. lay the groundwork for connecting biological mechanisms in the hippocampus at the cellular level to cognitive effects, an essential step to understanding the hippocampus, as well as its role in health and disease. For instance, their network may provide a concrete approach to studying how disruptions to the ways neurons make and break connections can impair memory formation. More generally, better models of the biological mechanisms involved in making computations in the hippocampus can help scientists better understand and test out theories about how memories are formed and stored in the brain. 
    more » « less
  4. null (Ed.)
    Abstract Understanding the physical drivers of seasonal hydroclimatic variability and improving predictive skill remains a challenge with important socioeconomic and environmental implications for many regions around the world. Physics-based deterministic models show limited ability to predict precipitation as the lead time increases, due to imperfect representation of physical processes and incomplete knowledge of initial conditions. Similarly, statistical methods drawing upon established climate teleconnections have low prediction skill due to the complex nature of the climate system. Recently, promising data-driven approaches have been proposed, but they often suffer from overparameterization and overfitting due to the short observational record, and they often do not account for spatiotemporal dependencies among covariates (i.e., predictors such as sea surface temperatures). This study addresses these challenges via a predictive model based on a graph-guided regularizer that simultaneously promotes similarity of predictive weights for highly correlated covariates and enforces sparsity in the covariate domain. This approach both decreases the effective dimensionality of the problem and identifies the most predictive features without specifying them a priori. We use large ensemble simulations from a climate model to construct this regularizer, reducing the structural uncertainty in the estimation. We apply the learned model to predict winter precipitation in the southwestern United States using sea surface temperatures over the entire Pacific basin, and demonstrate its superiority compared to other regularization approaches and statistical models informed by known teleconnections. Our results highlight the potential to combine optimally the space–time structure of predictor variables learned from climate models with new graph-based regularizers to improve seasonal prediction. 
    more » « less
  5. Anticipating how a person will interact with objects in an environment is essential for activity understanding, but existing methods are limited to the 2D space of video frames-capturing physically ungrounded predictions of "what" and ignoring the "where" and "how". We introduce FIction for 4D future interaction prediction from videos. Given an input video of a human activity, the goal is to predict which objects at what 3D locations the person will interact with in the next time period (e.g., cabinet, fridge), and how they will execute that interaction (e.g., poses for bending, reaching, pulling). Our novel model FIction fuses the past video observation of the person's actions and their environment to predict both the "where" and "how" of future interactions. Through comprehensive experiments on a variety of activities and real-world environments in EgoExo4D, we show that our proposed approach outperforms prior autoregressive and (lifted) 2D video models substantially, with more than 30% relative gains. 
    more » « less