The signed value and unsigned salience of reward prediction errors (RPEs) are critical to understanding reinforcement learning (RL) and cognitive control. Dorsomedial prefrontal cortex (dMPFC) and insula (INS) are key regions for integrating reward and surprise information, but conflicting evidence for both signed and unsigned activity has led to multiple proposals for the nature of RPE representations in these brain areas. Recently developed RL models allow neurons to respond differently to positive and negative RPEs. Here, we use intracranially recorded high frequency activity (HFA) to test whether this flexible asymmetric coding strategy captures RPE coding diversity in human INS and dMPFC. At the region level, we found a bias towards positive RPEs in both areas which paralleled behavioral adaptation. At the local level, we found spatially interleaved neural populations responding to unsigned RPE salience and valence-specific positive and negative RPEs. Furthermore, directional connectivity estimates revealed a leading role of INS in communicating positive and unsigned RPEs to dMPFC. These findings support asymmetric coding across distinct but intermingled neural populations as a core principle of RPE processing and inform theories of the role of dMPFC and INS in RL and cognitive control.
Learning signals during reinforcement learning and cognitive control rely on valenced reward prediction errors (RPEs) and non-valenced salience prediction errors (PEs) driven by surprise magnitude. A core debate in reward learning focuses on whether valenced and non-valenced PEs can be isolated in the human electroencephalogram (EEG). We combine behavioral modeling and single-trial EEG regression to disentangle sequential PEs in an interval timing task dissociating outcome valence, magnitude, and probability. Multiple regression across temporal, spatial, and frequency dimensions characterized a spatio-tempo-spectral cascade from early valenced RPE value to non-valenced RPE magnitude, followed by outcome probability indexed by a late frontal positivity. Separating negative and positive outcomes revealed the valenced RPE value effect is an artifact of overlap between two non-valenced RPE magnitude responses: frontal theta feedback-related negativity on losses and posterior delta reward positivity on wins. These results reconcile longstanding debates on the sequence of components representing reward and salience PEs in the human EEG.
more » « less- NSF-PAR ID:
- 10279242
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Communications Biology
- Volume:
- 4
- Issue:
- 1
- ISSN:
- 2399-3642
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract The Reward‐Positivity (RewP) is a frontocentral event‐related potential elicited following reward and punishment feedback. Reinforcement learning theories propose the RewP reflects a reward prediction error that increases following more favorable (vs. unfavorable) outcomes. An alternative perspective, however, proposes this component indexes a salience‐prediction error that increases following more salient outcomes. Evidence from prior studies that included both reward and punishment conditions is mixed, supporting both accounts. However, these studies often varied how feedback stimuli were repeated across reward and punishment conditions. Differences in the frequency of feedback stimuli may drive inconsistencies by introducing salience effects for infrequent stimuli regardless of whether they are associated with rewards or punishments. To test this hypothesis, the current study examined the effect of outcome valence and stimulus frequency on the RewP and neighboring P2 and P3 components in reward, punishment, and neutral contexts across two separate experiments that varied how often feedback stimuli were repeated between conditions. Experiment 1 revealed infrequent feedback stimuli generated overlapping positivity across all three components. However, controlling for stimulus frequency, experiment 2 revealed favorable outcomes that increased RewP and P3 positivity. Together, these results suggest the RewP reflects some combination of reward‐ and salience‐prediction error encoding. Results also indicate infrequent feedback stimuli elicited strong salience effects across all three components that may inflate, eliminate, or reverse outcome valence effects for the RewP and P3. These results resolve several inconsistencies in the literature and have important implications for electrocortical investigations of reward and punishment feedback processing.
-
Abstract Prior work shows that people respond more plastically to environmental influences, including cultural influences, if they carry the 7 or 2‐repeat (7/2R) allelic variant of the dopamine D4 receptor gene (
DRD4 ). The 7/2R carriers are thus more likely to endorse the norms and values of their culture. So far, however, mechanisms underlying this moderation of cultural acquisition byDRD4 are unclear. To address this gap in knowledge, we tested the hypothesis thatDRD4 modulates the processing of reward cues existing in the environment. About 72 young adults, preselected for theirDRD4 status, performed a gambling task, while the electroencephalogram was recorded. Principal components of event‐related potentials aligned to the Reward‐Positivity (associated with bottom‐up processing of reward prediction errors) and frontal‐P3 (associated with top‐down attention) were both significantly more positive following gains than following losses. As predicted, the gain‐loss differences were significantly larger for 7/2R carriers than for noncarriers. Also, as predicted, the cultural backgrounds of the participants (East Asian vs. European American) did not moderate the effects ofDRD4 . Our findings suggest that the 7/2R variant ofDRD4 enhances (a) the detection of reward prediction errors and (b) controlled attention that updates the context for the reward, thereby suggesting one possible mechanism underlying theDRD4 × Culture interactions. -
Cai, Ming Bo (Ed.)Protection often involves the capacity to prospectively plan the actions needed to mitigate harm. The computational architecture of decisions involving protection remains unclear, as well as whether these decisions differ from other beneficial prospective actions such as reward acquisition. Here we compare protection acquisition to reward acquisition and punishment avoidance to examine overlapping and distinct features across the three action types. Protection acquisition is positively valenced similar to reward. For both protection and reward, the more the actor gains, the more benefit. However, reward and protection occur in different contexts, with protection existing in aversive contexts. Punishment avoidance also occurs in aversive contexts, but differs from protection because punishment is negatively valenced and motivates avoidance. Across three independent studies (Total N = 600) we applied computational modeling to examine model-based reinforcement learning for protection, reward, and punishment in humans. Decisions motivated by acquiring protection evoked a higher degree of model-based control than acquiring reward or avoiding punishment, with no significant differences in learning rate. The context-valence asymmetry characteristic of protection increased deployment of flexible decision strategies, suggesting model-based control depends on the context in which outcomes are encountered as well as the valence of the outcome.more » « less
-
Abstract Learning about positive and negative outcomes of actions is crucial for survival and underpinned by conserved circuits including the striatum. How associations between actions and outcomes are formed is not fully understood, particularly when the outcomes have mixed positive and negative features. We developed a novel foraging (‘bandit’) task requiring mice to maximize rewards while minimizing punishments. By 2-photon Ca++imaging, we monitored activity of visually identified anterodorsal striatal striosomal and matrix neurons. We found that action-outcome associations for reward and punishment were encoded in parallel in partially overlapping populations. Single neurons could, for one action, encode outcomes of opposing valence. Striosome compartments consistently exhibited stronger representations of reinforcement outcomes than matrix, especially for high reward or punishment prediction errors. These findings demonstrate multiplexing of action-outcome contingencies by single identified striatal neurons and suggest that striosomal neurons are particularly important in action-outcome learning.