skip to main content


Title: Outcome valence and stimulus frequency affect neural responses to rewards and punishments
Abstract

The Reward‐Positivity (RewP) is a frontocentral event‐related potential elicited following reward and punishment feedback. Reinforcement learning theories propose the RewP reflects a reward prediction error that increases following more favorable (vs. unfavorable) outcomes. An alternative perspective, however, proposes this component indexes a salience‐prediction error that increases following more salient outcomes. Evidence from prior studies that included both reward and punishment conditions is mixed, supporting both accounts. However, these studies often varied how feedback stimuli were repeated across reward and punishment conditions. Differences in the frequency of feedback stimuli may drive inconsistencies by introducing salience effects for infrequent stimuli regardless of whether they are associated with rewards or punishments. To test this hypothesis, the current study examined the effect of outcome valence and stimulus frequency on the RewP and neighboring P2 and P3 components in reward, punishment, and neutral contexts across two separate experiments that varied how often feedback stimuli were repeated between conditions. Experiment 1 revealed infrequent feedback stimuli generated overlapping positivity across all three components. However, controlling for stimulus frequency, experiment 2 revealed favorable outcomes that increased RewP and P3 positivity. Together, these results suggest the RewP reflects some combination of reward‐ and salience‐prediction error encoding. Results also indicate infrequent feedback stimuli elicited strong salience effects across all three components that may inflate, eliminate, or reverse outcome valence effects for the RewP and P3. These results resolve several inconsistencies in the literature and have important implications for electrocortical investigations of reward and punishment feedback processing.

 
more » « less
NSF-PAR ID:
10362005
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Psychophysiology
Volume:
59
Issue:
3
ISSN:
0048-5772
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Learning signals during reinforcement learning and cognitive control rely on valenced reward prediction errors (RPEs) and non-valenced salience prediction errors (PEs) driven by surprise magnitude. A core debate in reward learning focuses on whether valenced and non-valenced PEs can be isolated in the human electroencephalogram (EEG). We combine behavioral modeling and single-trial EEG regression to disentangle sequential PEs in an interval timing task dissociating outcome valence, magnitude, and probability. Multiple regression across temporal, spatial, and frequency dimensions characterized a spatio-tempo-spectral cascade from early valenced RPE value to non-valenced RPE magnitude, followed by outcome probability indexed by a late frontal positivity. Separating negative and positive outcomes revealed the valenced RPE value effect is an artifact of overlap between two non-valenced RPE magnitude responses: frontal theta feedback-related negativity on losses and posterior delta reward positivity on wins. These results reconcile longstanding debates on the sequence of components representing reward and salience PEs in the human EEG.

     
    more » « less
  2. Abstract

    Bipolar spectrum and unipolar depressive disorders have been associated with distinct and opposite profiles of reward‐related neural activity. These opposite profiles may reflect a differential preexisting vulnerability for both types of disorders. In support, recent ERP studies find that, following reward feedback, a larger reward positivity (RewP) is associated with greater vulnerability for bipolar spectrum disorders, whereas a smaller RewP is associated with greater vulnerability for depression. However, prior studies have investigated only immediate rewards and have not examined dimensions of both bipolar disorder and unipolar depression within the same sample. The present study is the first to investigate feedback‐related ERP correlates of proneness to hypomania and unipolar depressive tendencies within the same sample and to expand our scope to include future rewards. Participants completed a modified time estimation task where the same monetary reward was available immediately or at one of five different future dates. Results revealed proneness to hypomania and unipolar depressive tendencies were related to an elevated and blunted RewP, respectively, but only following immediate rewards (i.e., today). Following rewards in the distant future (e.g., 8 months), proneness to hypomania and depressive tendencies were associated with elevated and blunted amplitudes for the P3, respectively, a subsequent ERP component reflecting motivational salience during extended feedback processing. Furthermore, these opposing profiles were independent of, and significantly different from, one another. These results suggest that feedback‐related ERPs following immediate and future rewards are candidate biomarkers that can physiologically separate vulnerability for bipolar spectrum from unipolar depressive disorders.

     
    more » « less
  3. Abstract

    Prior work shows that people respond more plastically to environmental influences, including cultural influences, if they carry the 7 or 2‐repeat (7/2R) allelic variant of the dopamine D4 receptor gene (DRD4). The 7/2R carriers are thus more likely to endorse the norms and values of their culture. So far, however, mechanisms underlying this moderation of cultural acquisition byDRD4are unclear. To address this gap in knowledge, we tested the hypothesis thatDRD4modulates the processing of reward cues existing in the environment. About 72 young adults, preselected for theirDRD4status, performed a gambling task, while the electroencephalogram was recorded. Principal components of event‐related potentials aligned to the Reward‐Positivity (associated with bottom‐up processing of reward prediction errors) and frontal‐P3 (associated with top‐down attention) were both significantly more positive following gains than following losses. As predicted, the gain‐loss differences were significantly larger for 7/2R carriers than for noncarriers. Also, as predicted, the cultural backgrounds of the participants (East Asian vs. European American) did not moderate the effects ofDRD4. Our findings suggest that the 7/2R variant ofDRD4enhances (a) the detection of reward prediction errors and (b) controlled attention that updates the context for the reward, thereby suggesting one possible mechanism underlying theDRD4× Culture interactions.

     
    more » « less
  4. Abstract

    Learning about positive and negative outcomes of actions is crucial for survival and underpinned by conserved circuits including the striatum. How associations between actions and outcomes are formed is not fully understood, particularly when the outcomes have mixed positive and negative features. We developed a novel foraging (‘bandit’) task requiring mice to maximize rewards while minimizing punishments. By 2-photon Ca++imaging, we monitored activity of visually identified anterodorsal striatal striosomal and matrix neurons. We found that action-outcome associations for reward and punishment were encoded in parallel in partially overlapping populations. Single neurons could, for one action, encode outcomes of opposing valence. Striosome compartments consistently exhibited stronger representations of reinforcement outcomes than matrix, especially for high reward or punishment prediction errors. These findings demonstrate multiplexing of action-outcome contingencies by single identified striatal neurons and suggest that striosomal neurons are particularly important in action-outcome learning.

     
    more » « less
  5. Cai, Ming Bo (Ed.)
    Protection often involves the capacity to prospectively plan the actions needed to mitigate harm. The computational architecture of decisions involving protection remains unclear, as well as whether these decisions differ from other beneficial prospective actions such as reward acquisition. Here we compare protection acquisition to reward acquisition and punishment avoidance to examine overlapping and distinct features across the three action types. Protection acquisition is positively valenced similar to reward. For both protection and reward, the more the actor gains, the more benefit. However, reward and protection occur in different contexts, with protection existing in aversive contexts. Punishment avoidance also occurs in aversive contexts, but differs from protection because punishment is negatively valenced and motivates avoidance. Across three independent studies (Total N = 600) we applied computational modeling to examine model-based reinforcement learning for protection, reward, and punishment in humans. Decisions motivated by acquiring protection evoked a higher degree of model-based control than acquiring reward or avoiding punishment, with no significant differences in learning rate. The context-valence asymmetry characteristic of protection increased deployment of flexible decision strategies, suggesting model-based control depends on the context in which outcomes are encountered as well as the valence of the outcome. 
    more » « less