Abstract In reinforcement learning (RL) experiments, participants learn to make rewarding choices in response to different stimuli; RL models use outcomes to estimate stimulus–response values that change incrementally. RL models consider any response type indiscriminately, ranging from more concretely defined motor choices (pressing a key with the index finger), to more general choices that can be executed in a number of ways (selecting dinner at the restaurant). However, does the learning process vary as a function of the choice type? In Experiment 1, we show that it does: Participants were slower and less accurate in learning correct choices of a general format compared with learning more concrete motor actions. Using computational modeling, we show that two mechanisms contribute to this. First, there was evidence of irrelevant credit assignment: The values of motor actions interfered with the values of other choice dimensions, resulting in more incorrect choices when the correct response was not defined by a single motor action; second, information integration for relevant general choices was slower. In Experiment 2, we replicated and further extended the findings from Experiment 1 by showing that slowed learning was attributable to weaker working memory use, rather than slowed RL. In both experiments, we ruled out the explanation that the difference in performance between two condition types was driven by difficulty/different levels of complexity. We conclude that defining a more abstract choice space used by multiple learning systems for credit assignment recruits executive resources, limiting how much such processes then contribute to fast learning.
more »
« less
This content will become publicly available on April 15, 2026
Neural mechanisms of credit assignment for delayed outcomes during contingent learning
Adaptive behavior in complex environments critically relies on the ability to appropriately link specific choices or actions to their outcomes. However, the neural mechanisms that support the ability to credit only those past choices believed to have caused the observed outcomes remain unclear. Here, we leverage multivariate pattern analyses of functional magnetic resonance imaging (fMRI) data and an adaptive learning task to shed light on the underlying neural mechanisms of such specific credit assignment. We find that the lateral orbitofrontal cortex (lOFC) and hippocampus (HC) code for the causal choice identity when credit needs to be assigned for choices that are separated from outcomes by a long delay, even when this delayed transition is punctuated by interim decisions. Further, we show when interim decisions must be made, learning is additionally supported by lateral frontopolar cortex (lFPC). Our results indicate that lFPC holds previous causal choices in a ‘pending’ state until a relevant outcome is observed, and the fidelity of these representations predicts the fidelity of subsequent causal choice representations in lOFC and HC during credit assignment. Together, these results highlight the importance of the timely reinstatement of specific causes in lOFC and HC in learning choice-outcome relationships when delays and choices intervene, a critical component of real-world learning and decision making.
more »
« less
- Award ID(s):
- 1846578
- PAR ID:
- 10634372
- Publisher / Repository:
- eLife Sciences Publications Ltd
- Date Published:
- Journal Name:
- eLife
- Volume:
- 13
- ISSN:
- 2050-084X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Understanding how cortical circuits generate complex behavior requires investigating the cell types that comprise them. Functional differences across pyramidal neuron (PyN) types have been observed within cortical areas, but it is not known whether these local differences extend throughout the cortex, nor whether additional differences emerge when larger-scale dynamics are considered. We used genetic and retrograde labeling to target pyramidal tract, intratelencephalic and corticostriatal projection neurons and measured their cortex-wide activity. Each PyN type drove unique neural dynamics, both at the local and cortex-wide scales. Cortical activity and optogenetic inactivation during an auditory decision task revealed distinct functional roles. All PyNs in parietal cortex were recruited during perception of the auditory stimulus, but, surprisingly, pyramidal tract neurons had the largest causal role. In frontal cortex, all PyNs were required for accurate choices but showed distinct choice tuning. Our results reveal that rich, cell-type-specific cortical dynamics shape perceptual decisions.more » « less
-
Uncertainty permeates decisions from the trivial to the profound. Integrating brain and behavioral evidence, we discuss how probabilistic (varied outcomes) and temporal (delayed outcomes) uncertainty differ across age and individuals; how critical tests adjudicate between theories of uncertainty (prospect theory and fuzzy-trace theory); and how these mechanisms might be represented in the brain. The same categorical gist representations of gains and losses account for choices and eye-tracking data in both value-allocation (add money to gambles) and risky-choice tasks, disconfrming prospect theory and confrming predictions of fuzzy-trace theory. The analysis is extended to delay discounting and disambiguated choices, explaining hidden zero effects that similarly turn on categorical distinctions between some gain and no gain, certain gain and uncertain gain, gain and loss, and now and later. Bold activation implicates dorsolateral prefrontal and posterior parietal cortices in gist strategies that are not just one tool in a grab-bag of cognitive options but rather are general strategies that systematically predict behaviors across many different tasks involving probabilistic and temporal uncertainty. High valuation (e.g., ventral striatum; ventromedial prefrontal cortex) and low executive control (e.g., lateral prefrontal cortex) contribute to risky and impatient choices, especially in youth. However, valuation in ventral striatum supports reward-maximizing and gist strategies in adulthood. Indeed, processing becomes less “rational” in the sense of maximizing gains and more noncompensatory (eye movements indicate fewer tradeoffs) as development progresses from adolescence to adulthood, as predicted. Implications for theoretically predicted “public-health paradoxes” are discussed, including gist versus verbatim thinking in drug experimentation and addiction.more » « less
-
To make effective decisions, people need to consider the relationship between actions and outcomes. These are often separated by time and space. The neural mechanisms by which disjoint actions and outcomes are linked remain unknown. One promising hypothesis involves neural replay of nonlocal experience. Using a task that segregates direct from indirect value learning, combined with magnetoencephalography, we examined the role of neural replay in human nonlocal learning. After receipt of a reward, we found significant backward replay of nonlocal experience, with a 160-millisecond state-to-state time lag, which was linked to efficient learning of action values. Backward replay and behavioral evidence of nonlocal learning were more pronounced for experiences of greater benefit for future behavior. These findings support nonlocal replay as a neural mechanism for solving complex credit assignment problems during learning.more » « less
-
Abstract We investigated how the human brain integrates experiences of specific events to build general knowledge about typical event structure. We examined an episodic memory area important for temporal relations, anterior-lateral entorhinal cortex, and a semantic memory area important for action concepts, middle temporal gyrus, to understand how and when these areas contribute to these processes. Participants underwent functional magnetic resonance imaging while learning and recalling temporal relations among novel events over two sessions 1 week apart. Across distinct contexts, individual temporal relations among events could either be consistent or inconsistent with each other. Within each context, during the recall phase, we measured associative coding as the difference of multivoxel correlations among related vs unrelated pairs of events. Neural regions that form integrative representations should exhibit stronger associative coding in the consistent than the inconsistent contexts. We found evidence of integrative representations that emerged quickly in anterior-lateral entorhinal cortex (at session 1), and only subsequently in middle temporal gyrus, which showed a significant change across sessions. A complementary pattern of findings was seen with signatures during learning. This suggests that integrative representations are established early in anterior-lateral entorhinal cortex and may be a pathway to the later emergence of semantic knowledge in middle temporal gyrus.more » « less
An official website of the United States government
