skip to main content


Title: Models of heterogeneous dopamine signaling in an insect learning and memory center
The Drosophila mushroom body exhibits dopamine dependent synaptic plasticity that underlies the acquisition of associative memories. Recordings of dopamine neurons in this system have identified signals related to external reinforcement such as reward and punishment. However, other factors including locomotion, novelty, reward expectation, and internal state have also recently been shown to modulate dopamine neurons. This heterogeneity is at odds with typical modeling approaches in which these neurons are assumed to encode a global, scalar error signal. How is dopamine dependent plasticity coordinated in the presence of such heterogeneity? We develop a modeling approach that infers a pattern of dopamine activity sufficient to solve defined behavioral tasks, given architectural constraints informed by knowledge of mushroom body circuitry. Model dopamine neurons exhibit diverse tuning to task parameters while nonetheless producing coherent learned behaviors. Notably, reward prediction error emerges as a mode of population activity distributed across these neurons. Our results provide a mechanistic framework that accounts for the heterogeneity of dopamine activity during learning and behavior.  more » « less
Award ID(s):
1707398
NSF-PAR ID:
10338062
Author(s) / Creator(s):
;
Editor(s):
Morrison, Abigail
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
8
ISSN:
1553-7358
Page Range / eLocation ID:
e1009205
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cai, Ming Bo (Ed.)

    A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.

     
    more » « less
  2. The vertical lobe (VL) in the octopus brain plays an essential role in its sophisticated learning and memory. Early anatomical studies suggested that the VL is organized in a “fan-out fan-in” connectivity matrix comprising only three morphologically identified neuron types; input axons from the superior frontal lobe (SFL) innervating en passant millions of small amacrine interneurons (AMs) which converge sharply onto large VL output neurons (LNs). Recent physiological studies confirmed the feedforward excitatory connectivity: a glutamatergic synapse at the first SFL-to-AM synaptic layer and a cholinergic AM-to-LNs synapse. SFL-to-AMs synapses show a robust hippocampal-like activity-dependent long-term potentiation (LTP) of transmitter release. 5-HT, octopamine, dopamine, and nitric oxide modulate short- and long-term VL synaptic plasticity. Here we present a comprehensive histolabeling study to better characterize the neural elements in the VL. We generally confirmed glutamatergic SFLs and cholinergic AMs. Intense labeling for NOS activity in the AMs neurites fitted with the NO-dependent presynaptic LTP mechanism at the SFL-to-AM synapse. New discoveries here reveal more heterogeneity of the VL neurons than previously thought. GABAergic AMs suggest a subpopulation of inhibitory interneurons in the first input layer. Clear GABA labeling in the cell bodies of LNs supported an inhibitory VL output yet the LNs co-expressed FMRFamide-like neuropeptides suggesting an additional neuromodulatory role of the VL output. Furthermore, a group of LNs was glutamatergic. A new cluster of cells organized in a “deep nucleus” showed rich catecholaminergic labeling and may play a role in intrinsic neuromodulation. In situ hybridization and immunolabeling allowed characterization and localization of a rich array of neuropeptides and neuromodulators, likely involved in reward/punishment signals. This analysis of the fast transmission system, together with the newly found cellular elements helps integrate behavioral, physiological, pharmacological, and connectome findings into a more comprehensive understanding of an efficient learning and memory network. 
    more » « less
  3. The survival of an organism is dependent on its ability to respond to cues in the environment. Such cues can attain control over behavior as a function of the value ascribed to them. Some individuals have an inherent tendency to attribute reward-paired cues with incentive motivational value, or incentive salience. For these individuals, termed sign-trackers, a discrete cue that precedes reward delivery becomes attractive and desirable in its own right. Prior work suggests that the behavior of sign-trackers is dopamine-dependent, and cue-elicited dopamine in the NAc is believed to encode the incentive value of reward cues. Here we exploited the temporal resolution of optogenetics to determine whether selective inhibition of ventral tegmental area (VTA) dopamine neurons during cue presentation attenuates the propensity to sign-track. Using male tyrosine hydroxylase(TH)-CreLong Evans rats, it was found that, under baseline conditions, ∼84% ofTH-Crerats tend to sign-track. Laser-induced inhibition of VTA dopamine neurons during cue presentation prevented the development of sign-tracking behavior, without affecting goal-tracking behavior. When laser inhibition was terminated, these same rats developed a sign-tracking response. Video analysis using DeepLabCutTMrevealed that, relative to rats that received laser inhibition, rats in the control group spent more time near the location of the reward cue even when it was not present and were more likely to orient toward and approach the cue during its presentation. These findings demonstrate that cue-elicited dopamine release is critical for the attribution of incentive salience to reward cues.

    SIGNIFICANCE STATEMENTActivity of dopamine neurons in the ventral tegmental area (VTA) during cue presentation is necessary for the development of a sign-tracking, but not a goal-tracking, conditioned response in a Pavlovian task. We capitalized on the temporal precision of optogenetics to pair cue presentation with inhibition of VTA dopamine neurons. A detailed behavioral analysis with DeepLabCutTMrevealed that cue-directed behaviors do not emerge without dopamine neuron activity in the VTA. Importantly, however, when optogenetic inhibition is lifted, cue-directed behaviors increase, and a sign-tracking response develops. These findings confirm the necessity of dopamine neuron activity in the VTA during cue presentation to encode the incentive value of reward cues.

     
    more » « less
  4. Dopaminergic neurons with distinct projection patterns and physiological properties compose memory subsystems in a brain. However, it is poorly understood whether or how they interact during complex learning. Here, we identify a feedforward circuit formed between dopamine subsystems and show that it is essential for second-order conditioning, an ethologically important form of higher-order associative learning. The Drosophila mushroom body comprises a series of dopaminergic compartments, each of which exhibits distinct memory dynamics. We find that a slow and stable memory compartment can serve as an effective ‘teacher’ by instructing other faster and transient memory compartments via a single key interneuron, which we identify by connectome analysis and neurotransmitter prediction. This excitatory interneuron acquires enhanced response to reward-predicting odor after first-order conditioning and, upon activation, evokes dopamine release in the ‘student’ compartments. These hierarchical connections between dopamine subsystems explain distinct properties of first- and second-order memory long known by behavioral psychologists. 
    more » « less
  5. Animals employ diverse learning rules and synaptic plasticity dynamics to record temporal and statistical information about the world. However, the molecular mechanisms underlying this diversity are poorly understood. The anatomically defined compartments of the insect mushroom body function as parallel units of associative learning, with different learning rates, memory decay dynamics and flexibility (Aso and Rubin, 2016). Here, we show that nitric oxide (NO) acts as a neurotransmitter in a subset of dopaminergic neurons in Drosophila. NO’s effects develop more slowly than those of dopamine and depend on soluble guanylate cyclase in postsynaptic Kenyon cells. NO acts antagonistically to dopamine; it shortens memory retention and facilitates the rapid updating of memories. The interplay of NO and dopamine enables memories stored in local domains along Kenyon cell axons to be specialized for predicting the value of odors based only on recent events. Our results provide key mechanistic insights into how diverse memory dynamics are established in parallel memory systems. 
    more » « less