skip to main content


Title: Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG
Abstract

Learning signals during reinforcement learning and cognitive control rely on valenced reward prediction errors (RPEs) and non-valenced salience prediction errors (PEs) driven by surprise magnitude. A core debate in reward learning focuses on whether valenced and non-valenced PEs can be isolated in the human electroencephalogram (EEG). We combine behavioral modeling and single-trial EEG regression to disentangle sequential PEs in an interval timing task dissociating outcome valence, magnitude, and probability. Multiple regression across temporal, spatial, and frequency dimensions characterized a spatio-tempo-spectral cascade from early valenced RPE value to non-valenced RPE magnitude, followed by outcome probability indexed by a late frontal positivity. Separating negative and positive outcomes revealed the valenced RPE value effect is an artifact of overlap between two non-valenced RPE magnitude responses: frontal theta feedback-related negativity on losses and posterior delta reward positivity on wins. These results reconcile longstanding debates on the sequence of components representing reward and salience PEs in the human EEG.

 
more » « less
NSF-PAR ID:
10279242
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Communications Biology
Volume:
4
Issue:
1
ISSN:
2399-3642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The signed value and unsigned salience of reward prediction errors (RPEs) are critical to understanding reinforcement learning (RL) and cognitive control. Dorsomedial prefrontal cortex (dMPFC) and insula (INS) are key regions for integrating reward and surprise information, but conflicting evidence for both signed and unsigned activity has led to multiple proposals for the nature of RPE representations in these brain areas. Recently developed RL models allow neurons to respond differently to positive and negative RPEs. Here, we use intracranially recorded high frequency activity (HFA) to test whether this flexible asymmetric coding strategy captures RPE coding diversity in human INS and dMPFC. At the region level, we found a bias towards positive RPEs in both areas which paralleled behavioral adaptation. At the local level, we found spatially interleaved neural populations responding to unsigned RPE salience and valence-specific positive and negative RPEs. Furthermore, directional connectivity estimates revealed a leading role of INS in communicating positive and unsigned RPEs to dMPFC. These findings support asymmetric coding across distinct but intermingled neural populations as a core principle of RPE processing and inform theories of the role of dMPFC and INS in RL and cognitive control.

     
    more » « less
  2. Abstract

    The Reward‐Positivity (RewP) is a frontocentral event‐related potential elicited following reward and punishment feedback. Reinforcement learning theories propose the RewP reflects a reward prediction error that increases following more favorable (vs. unfavorable) outcomes. An alternative perspective, however, proposes this component indexes a salience‐prediction error that increases following more salient outcomes. Evidence from prior studies that included both reward and punishment conditions is mixed, supporting both accounts. However, these studies often varied how feedback stimuli were repeated across reward and punishment conditions. Differences in the frequency of feedback stimuli may drive inconsistencies by introducing salience effects for infrequent stimuli regardless of whether they are associated with rewards or punishments. To test this hypothesis, the current study examined the effect of outcome valence and stimulus frequency on the RewP and neighboring P2 and P3 components in reward, punishment, and neutral contexts across two separate experiments that varied how often feedback stimuli were repeated between conditions. Experiment 1 revealed infrequent feedback stimuli generated overlapping positivity across all three components. However, controlling for stimulus frequency, experiment 2 revealed favorable outcomes that increased RewP and P3 positivity. Together, these results suggest the RewP reflects some combination of reward‐ and salience‐prediction error encoding. Results also indicate infrequent feedback stimuli elicited strong salience effects across all three components that may inflate, eliminate, or reverse outcome valence effects for the RewP and P3. These results resolve several inconsistencies in the literature and have important implications for electrocortical investigations of reward and punishment feedback processing.

     
    more » « less
  3. Abstract

    Prior work shows that people respond more plastically to environmental influences, including cultural influences, if they carry the 7 or 2‐repeat (7/2R) allelic variant of the dopamine D4 receptor gene (DRD4). The 7/2R carriers are thus more likely to endorse the norms and values of their culture. So far, however, mechanisms underlying this moderation of cultural acquisition byDRD4are unclear. To address this gap in knowledge, we tested the hypothesis thatDRD4modulates the processing of reward cues existing in the environment. About 72 young adults, preselected for theirDRD4status, performed a gambling task, while the electroencephalogram was recorded. Principal components of event‐related potentials aligned to the Reward‐Positivity (associated with bottom‐up processing of reward prediction errors) and frontal‐P3 (associated with top‐down attention) were both significantly more positive following gains than following losses. As predicted, the gain‐loss differences were significantly larger for 7/2R carriers than for noncarriers. Also, as predicted, the cultural backgrounds of the participants (East Asian vs. European American) did not moderate the effects ofDRD4. Our findings suggest that the 7/2R variant ofDRD4enhances (a) the detection of reward prediction errors and (b) controlled attention that updates the context for the reward, thereby suggesting one possible mechanism underlying theDRD4× Culture interactions.

     
    more » « less
  4. Introduction: Back pain is one of the most common causes of pain in the United States. Spinal cord stimulation (SCS) is an intervention for patients with chronic back pain (CBP). However, SCS decreases pain in only 58% of patients and relies on self-reported pain scores as outcome measures. An SCS trial is temporarily implanted for seven days and helps to determine if a permanent SCS is needed. Patients that have a >50% reduction in pain from the trial stimulator makes them eligible for permanent implantation. However, self-reported measures reveal little on how mechanisms in the brain are altered. Other measurements of pain intensity, onset, medication, disabilities, depression, and anxiety have been used with machine learning to predict outcomes with accuracies <70%. We aim to predict long-term SCS responders at 6-months using baseline resting EEG and machine learning. Materials and Methods: We obtained 10-minutes of resting electroencephalography (EEG) and pain questionnaires from nine participants with CBP at two time points: 1) pre-trial baseline. 2) Six months after SCS permanent implant surgery. Subjects were designated as high or moderate responders based on the amount of pain relief provided by the long-term (post six months) SCS, and pain scored on a scale of 0-10 with 0 being no pain and 10 intolerable. We used the resting EEG from baseline to predict long-term treatment outcome. Resting EEG data was fed through a pipeline for classification and to map dipole sources. EEG signals were preprocessed using the EEGLAB toolbox. Independent component analysis and dipole fitting were used to linearly unmix the signal and to map dipole sources from the brain. Spectral analysis was performed to obtain the frequency distribution of the signal. Each power band, delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (30-100 Hz), as well as the entire spectrum (1-100 Hz), were used for classification. Furthermore, dipole sources were ranked based on classification feature weights to determine the significance of specific regions in the brain. We used support vector machines to predict pain outcomes. Results and Discussion: We found higher frequency powerbands provide overall classification performance of 88.89%. Differences in power are seen between moderate and high responders in both the frontal and parietal regions for theta, alpha, beta, and the entire spectrum (Fig.1). This can potentially be used to predict patient response to SCS. Conclusions: We found evidence of decreased power in theta, alpha, beta, and entire spectrum in the anterior regions of the parietal cortex and posterior regions of the frontal cortex between moderate and high responders, which can be used for predicting treatment outcomes in long-term pain relief from SCS. Long-term treatment outcome prediction using baseline EEG data has the potential to contribute to decision making in terms of permanent surgery, forgo trial periods, and improve clinical efficiency by beginning to understand the mechanism of action of SCS in the human brain. 
    more » « less
  5. A key component of economic decisions is the integration of information about reward outcomes and probabilities in selecting between competing options. In many species, risky choice is influenced by the magnitude of available outcomes, probability of success and the possibility of extreme outcomes. Chimpanzees are generally regarded to be risk-seeking. In this study, we examined two aspects of chimpanzees' risk preferences: first, whether setting the value of the non-preferred outcome of a risky option to zero changes chimpanzees’ risk preferences, and second, whether individual risk preferences are stable across two different measures. Across two experiments, we found chimpanzees (Pan troglodytes,n= 23) as a group to be risk-neutral to risk-avoidant with highly stable individual risk preferences. We discuss how the possibility of going empty-handed might reduce chimpanzees' risk-seeking relative to previous studies. This malleability in risk preferences as a function of experimental parameters and individual differences raises interesting questions about whether it is appropriate or helpful to categorize a species as a whole as risk-seeking or risk-avoidant.

    This article is part of the theme issue ‘Existence and prevalence of economic behaviours among non-human primates’.

     
    more » « less