skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.  more » « less
Award ID(s):
2018631
PAR ID:
10472794
Author(s) / Creator(s):
;
Editor(s):
Cai, Ming Bo
Publisher / Repository:
PLOS Computational Biology
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
19
Issue:
8
ISSN:
1553-7358
Page Range / eLocation ID:
e1011385
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Morrison, Abigail (Ed.)
    The Drosophila mushroom body exhibits dopamine dependent synaptic plasticity that underlies the acquisition of associative memories. Recordings of dopamine neurons in this system have identified signals related to external reinforcement such as reward and punishment. However, other factors including locomotion, novelty, reward expectation, and internal state have also recently been shown to modulate dopamine neurons. This heterogeneity is at odds with typical modeling approaches in which these neurons are assumed to encode a global, scalar error signal. How is dopamine dependent plasticity coordinated in the presence of such heterogeneity? We develop a modeling approach that infers a pattern of dopamine activity sufficient to solve defined behavioral tasks, given architectural constraints informed by knowledge of mushroom body circuitry. Model dopamine neurons exhibit diverse tuning to task parameters while nonetheless producing coherent learned behaviors. Notably, reward prediction error emerges as a mode of population activity distributed across these neurons. Our results provide a mechanistic framework that accounts for the heterogeneity of dopamine activity during learning and behavior. 
    more » « less
  2. Abstract The learning of stimulus-outcome associations allows for predictions about the environment. Ventral striatum and dopaminergic midbrain neurons form a larger network for generating reward prediction signals from sensory cues. Yet, the network plasticity mechanisms to generate predictive signals in these distributed circuits have not been entirely clarified. Also, direct evidence of the underlying interregional assembly formation and information transfer is still missing. Here we show that phasic dopamine is sufficient to reinforce the distinctness of stimulus representations in the ventral striatum even in the absence of reward. Upon such reinforcement, striatal stimulus encoding gives rise to interregional assemblies that drive dopaminergic neurons during stimulus-outcome learning. These assemblies dynamically encode the predicted reward value of conditioned stimuli. Together, our data reveal that ventral striatal and midbrain reward networks form a reinforcing loop to generate reward prediction coding. 
    more » « less
  3. The posterior medial (POm) thalamus is heavily interconnected with sensory and motor circuitry and is likely involved in behavioral modulation and sensorimotor integration. POm provides axonal projections to the dorsal striatum, a hotspot of sensorimotor processing, yet the role of POm-striatal projections has remained undetermined. Using optogenetics with mouse brain slice electrophysiology, we found that POm provides robust synaptic input to direct and indirect pathway striatal spiny projection neurons (D1- and D2-SPNs, respectively) and parvalbumin-expressing fast spiking interneurons (PVs). During the performance of a whisker-based tactile discrimination task in head-restrained mice, POm-striatal projections displayed learning-related activation correlating with anticipatory, but not reward-related, pupil dilation. Inhibition of POm-striatal axons across learning caused slower reaction times and an increase in the number of training sessions for expert performance. Our data indicate that POm-striatal inputs provide a behaviorally relevant arousal-related signal, which may prime striatal circuitry for efficient integration of subsequent choice-related inputs. 
    more » « less
  4. Abstract The striatum plays an important role in learning, selecting, and executing actions. As a major input hub of the basal ganglia, it receives and processes a diverse array of signals related to sensory, motor, and cognitive information. Aberrant neural activity in this area is implicated in a wide variety of neurological and psychiatric disorders. It is therefore important to understand the hallmarks of disrupted striatal signal processing. This review surveys literature examining howin vivostriatal microcircuit dynamics are impacted in animal models of one of the most widely studied movement disorders, Parkinson's disease. The review identifies four major features of aberrant striatal dynamics: altered relative levels of direct and indirect pathway activity, impaired information processing by projection neurons, altered information processing by interneurons, and increased synchrony. 
    more » « less
  5. AbstractActivation of the cAMP pathway is one of the common mechanisms underlying long‐term potentiation (LTP). In theDrosophilamushroom body, simultaneous activation of odour‐coding Kenyon cells (KCs) and reinforcement‐coding dopaminergic neurons activates adenylyl cyclase in KC presynaptic terminals, which is believed to trigger synaptic plasticity underlying olfactory associative learning. However, learning induces long‐term depression (LTD) at these synapses, contradicting the universal role of cAMP as a facilitator of transmission. Here, we developed a system to electrophysiologically monitor both short‐term and long‐term synaptic plasticity at KC output synapses and demonstrated that they are indeed an exception in which activation of the cAMP–protein kinase A pathway induces LTD. Contrary to the prevailing model, our cAMP imaging found no evidence for synergistic action of dopamine and KC activity on cAMP synthesis. Furthermore, we found that forskolin‐induced cAMP increase alone was insufficient for plasticity induction; it additionally required simultaneous KC activation to replicate the presynaptic LTD induced by pairing with dopamine. On the other hand, activation of the cGMP pathway paired with KC activation induced slowly developing LTP, proving antagonistic actions of the two second‐messenger pathways predicted by behavioural study. Finally, KC subtype‐specific interrogation of synapses revealed that different KC subtypes exhibit distinct plasticity duration even among synapses on the same postsynaptic neuron. Thus, our work not only revises the role of cAMP in synaptic plasticity by uncovering the unexpected convergence point of the cAMP pathway and neuronal activity, but also establishes the methods to address physiological mechanisms of synaptic plasticity in this important model.image Key pointsAlthough presynaptic cAMP increase generally facilitates synapses, olfactory associative learning inDrosophila, which depends on dopamine and cAMP signalling genes, induces long‐term depression (LTD) at the mushroom body output synapses.By combining electrophysiology, pharmacology and optogenetics, we directly demonstrate that these synapses are an exception where activation of the cAMP–protein kinase A pathway leads to presynaptic LTD.Dopamine‐ or forskolin‐induced cAMP increase alone is not sufficient for LTD induction; neuronal activity, which has been believed to trigger cAMP synthesis in synergy with dopamine input, is required in the downstream pathway of cAMP.In contrast to cAMP, activation of the cGMP pathway paired with neuronal activity induces presynaptic long‐term potentiation, which explains behaviourally observed opposing actions of transmitters co‐released by dopaminergic neurons.Our work not only revises the role of cAMP in synaptic plasticity, but also provides essential methods to address physiological mechanisms of synaptic plasticity in this important model system. 
    more » « less