skip to main content


Title: Computational mechanisms of distributed value representations and mixed learning strategies
Abstract

Learning appropriate representations of the reward environment is challenging in the real world where there are many options, each with multiple attributes or features. Despite existence of alternative solutions for this challenge, neural mechanisms underlying emergence and adoption of value representations and learning strategies remain unknown. To address this, we measure learning and choice during a multi-dimensional probabilistic learning task in humans and trained recurrent neural networks (RNNs) to capture our experimental observations. We find that human participants estimate stimulus-outcome associations by learning and combining estimates of reward probabilities associated with the informative feature followed by those of informative conjunctions. Through analyzing representations, connectivity, and lesioning of the RNNs, we demonstrate this mixed learning strategy relies on a distributed neural code and opponency between excitatory and inhibitory neurons through value-dependent disinhibition. Together, our results suggest computational and neural mechanisms underlying emergence of complex learning strategies in naturalistic settings.

 
more » « less
Award ID(s):
1943767
NSF-PAR ID:
10383473
Author(s) / Creator(s):
;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
12
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    During spatial exploration, neural circuits in the hippocampus store memories of sequences of sensory events encountered in the environment. When sensory information is absent during ‘offline’ resting periods, brief neuronal population bursts can ‘replay’ sequences of activity that resemble bouts of sensory experience. These sequences can occur in either forward or reverse order, and can even include spatial trajectories that have not been experienced, but are consistent with the topology of the environment. The neural circuit mechanisms underlying this variable and flexible sequence generation are unknown. Here we demonstrate in a recurrent spiking network model of hippocampal area CA3 that experimental constraints on network dynamics such as population sparsity, stimulus selectivity, rhythmicity and spike rate adaptation, as well as associative synaptic connectivity, enable additional emergent properties, including variable offline memory replay. In an online stimulus‐driven state, we observed the emergence of neuronal sequences that swept from representations of past to future stimuli on the timescale of the theta rhythm. In an offline state driven only by noise, the network generated both forward and reverse neuronal sequences, and recapitulated the experimental observation that offline memory replay events tend to include salient locations like the site of a reward. These results demonstrate that biological constraints on the dynamics of recurrent neural circuits are sufficient to enable memories of sensory events stored in the strengths of synaptic connections to be flexibly read out during rest and sleep, which is thought to be important for memory consolidation and planning of future behaviour.image

    Key points

    A recurrent spiking network model of hippocampal area CA3 was optimized to recapitulate experimentally observed network dynamics during simulated spatial exploration.

    During simulated offline rest, the network exhibited the emergent property of generating flexible forward, reverse and mixed direction memory replay events.

    Network perturbations and analysis of model diversity and degeneracy identified associative synaptic connectivity and key features of network dynamics as important for offline sequence generation.

    Network simulations demonstrate that population over‐representation of salient positions like the site of reward results in biased memory replay.

     
    more » « less
  2. Biological agents do not have infinite resources to learn new things. For this reason, a central aspect of human learning is the ability to recycle previously acquired knowledge in a way that allows for faster, less resource-intensive acquisition of new skills. In spite of that, how neural networks in the brain leverage existing knowledge to learn new computations is not well understood. In this work, we study this question in artificial recurrent neural networks (RNNs) trained on a corpus of commonly used neuroscience tasks. Combining brain-inspired inductive biases we call functional and structural, we propose a system that learns new tasks by building on top of pre-trained latent dynamics organised into separate recurrent modules. These modules, acting as prior knowledge acquired previously through evolution or development, are pre-trained on the statistics of the full corpus of tasks so as to be independent and maximally informative. The resulting model, we call a Modular Latent Primitives (MoLaP) network, allows for learning multiple tasks while keeping parameter counts, and updates, low. We also show that the skills acquired with our approach are more robust to a broad range of perturbations compared to those acquired with other multi-task learning strategies, and that generalisation to new tasks is facilitated. This work offers a new perspective on achieving efficient multi-task learning in the brain, illustrating the benefits of leveraging pre-trained latent dynamical primitives. 
    more » « less
  3. Abstract The learning of stimulus-outcome associations allows for predictions about the environment. Ventral striatum and dopaminergic midbrain neurons form a larger network for generating reward prediction signals from sensory cues. Yet, the network plasticity mechanisms to generate predictive signals in these distributed circuits have not been entirely clarified. Also, direct evidence of the underlying interregional assembly formation and information transfer is still missing. Here we show that phasic dopamine is sufficient to reinforce the distinctness of stimulus representations in the ventral striatum even in the absence of reward. Upon such reinforcement, striatal stimulus encoding gives rise to interregional assemblies that drive dopaminergic neurons during stimulus-outcome learning. These assemblies dynamically encode the predicted reward value of conditioned stimuli. Together, our data reveal that ventral striatal and midbrain reward networks form a reinforcing loop to generate reward prediction coding. 
    more » « less
  4. Shekhar, Shashi ; Zhou, Zhi-Hua ; Chiang, Yao-Yi ; Stiglic, Gregor (Ed.)
    In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to minibatch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating the utility of our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow. 
    more » « less
  5. Abstract

    The signed value and unsigned salience of reward prediction errors (RPEs) are critical to understanding reinforcement learning (RL) and cognitive control. Dorsomedial prefrontal cortex (dMPFC) and insula (INS) are key regions for integrating reward and surprise information, but conflicting evidence for both signed and unsigned activity has led to multiple proposals for the nature of RPE representations in these brain areas. Recently developed RL models allow neurons to respond differently to positive and negative RPEs. Here, we use intracranially recorded high frequency activity (HFA) to test whether this flexible asymmetric coding strategy captures RPE coding diversity in human INS and dMPFC. At the region level, we found a bias towards positive RPEs in both areas which paralleled behavioral adaptation. At the local level, we found spatially interleaved neural populations responding to unsigned RPE salience and valence-specific positive and negative RPEs. Furthermore, directional connectivity estimates revealed a leading role of INS in communicating positive and unsigned RPEs to dMPFC. These findings support asymmetric coding across distinct but intermingled neural populations as a core principle of RPE processing and inform theories of the role of dMPFC and INS in RL and cognitive control.

     
    more » « less