Animals flexibly select actions that maximize future rewards despite facing uncertainty in sen- sory inputs, action-outcome associations or contexts. The computational and circuit mechanisms underlying this ability are poorly understood. A clue to such computations can be found in the neural systems involved in representing sensory features, sensorimotor-outcome associations and contexts. Specifically, the basal ganglia (BG) have been implicated in forming sensorimotor-outcome association [1] while the thalamocortical loop between the prefrontal cortex (PFC) and mediodorsal thalamus (MD) has been shown to engage in contextual representations [2, 3]. Interestingly, both human and non-human animal experiments indicate that the MD represents different forms of uncertainty [3, 4]. However, finding evidence for uncertainty representation gives little insight into how it is utilized to drive behavior. Normative theories have excelled at providing such computational insights. For example, de- ploying traditional machine learning algorithms to fit human decision-making behavior has clarified how associative uncertainty alters exploratory behavior [5, 6]. However, despite their computa- tional insight and ability to fit behaviors, normative models cannot be directly related to neural mechanisms. Therefore, a critical gap exists between what we know about the neural representa- tion of uncertainty on one end and the computational functions uncertainty serves in cognition. This gap can be filled with mechanistic neural models that can approximate normative models as well as generate experimentally observed neural representations. In this work, we build a mechanistic cortico-thalamo-BG loop network model that directly fills this gap. The model includes computationally-relevant mechanistic details of both BG and thalamocortical circuits such as distributional activities of dopamine [7] and thalamocortical pro- jection modulating cortical effective connectivity [3] and plasticity [8] via interneurons. We show that our network can more efficiently and flexibly explore various environments compared to com- monly used machine learning algorithms and we show that the mechanistic features we include are crucial for handling different types of uncertainty in decision-making. Furthermore, through derivation and mathematical proofs, we approximate our models to two novel normative theories. We show mathematically the first has near-optimal performance on bandit tasks. The second is a generalization on the well-known CUMSUM algorithm, which is known to be optimal on single change point detection tasks [9]. Our normative model expands on this by detecting multiple sequential contextual changes. To our knowledge, our work is the first to link computational in- sights, normative models and neural realization together in decision-making under various forms of uncertainty.
more »
« less
Thalamocortical contribution to flexible learning in neural systems. Network Neuroscience
Animal brains evolved to optimize behavior in dynamic environments, flexibly selecting actions that maximize future rewards in different contexts. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately mapping environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the connections responsible for rewards, can be accomplished when the relation between sensory inputs, action taken, environmental context with rewards is ambiguous. The credit assignment problem can be categorized into context-independent structural credit assignment and context-dependent continual learning. In this perspective, we survey prior approaches to these two problems and advance the notion that the brain’s specialized neural architectures provide efficient solutions. Within this framework, the thalamus with its cortical and basal ganglia interactions serves as a systems-level solution to credit assignment. Specifically, we propose that thalamocortical interaction is the locus of meta-learning where the thalamus provides cortical control functions that parametrize the cortical activity association space. By selecting among these control functions, the basal ganglia hierarchically guide thalamocortical plasticity across two timescales to enable meta-learning. The faster timescale establishes contextual associations to enable behavioral flexibility while the slower one enables generalization to new contexts.
more »
« less
- PAR ID:
- 10405674
- Date Published:
- Journal Name:
- Network neuroscience
- Volume:
- 6
- Issue:
- 4
- ISSN:
- 2472-1751
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Vocal learning in songbirds is mediated by cortico‐basal ganglia circuits that govern diverse functions during different stages of development. We investigated developmental changes in axonal projections to and from motor cortical regions that underlie learned vocal behavior in juvenile zebra finches (Taeniopygia guttata). Neurons in LMAN‐core project to RA, a motor cortical region that drives vocal output; these RA‐projecting neurons send a transient collateral projection to AId, a region adjacent to RA, during early vocal development. Both RA and AId project to a region of dorsal thalamus (DLM), which forms a feedback pathway to cortico‐basal ganglia circuitry. These projections provide pathways conveying efference copy and a means by which information about vocal motor output could be reintegrated into cortico‐basal ganglia circuitry, potentially aiding in the refinement of juvenile vocalizations during learning. We used tract‐tracing techniques to label the projections of LMAN‐core to AId and of RA to DLM in juvenile songbirds. The volume and density of terminal label in the LMAN‐core→AId projection declined substantially during early stages of sensorimotor learning. In contrast, the RA→DLM projection showed no developmental change. The retraction of LMAN‐core→AId axon collaterals indicates a loss of efference copy to AId and suggests that projections that are present only during early stages of sensorimotor learning mediate unique, temporally restricted processes of goal‐directed learning. Conversely, the persistence of the RA→DLM projection may serve to convey motor information forward to the thalamus to facilitate song production during both learning and maintenance of vocalizations.more » « less
-
Abstract The basal ganglia play pivotal roles in motor control and cognitive functioning. These nuclei are embedded in an anatomical loop: cortex to basal ganglia to thalamus back to cortex. We focus here on an essential synapse for descending control, from cortical layer 5 (L5) onto the GABAergic spiny projection neurons (SPNs) of the caudoputamen (CP). We employed genetic labeling to distinguish L5 neurons from somatosensory (S1) and motor (M1) cortices in large volume serial electron microscopy and electrophysiology datasets to better detail these inputs. First, M1 and S1 synapses showed a strong preference to innervate the spines of SPNs and rarely contacted aspiny cells, which are likely to be interneurons. Second, L5 inputs commonly converge from both areas onto single SPNs. Third, compared to unlabeled terminals in CP, those labeled from M1 and S1 show ultrastructural hallmarks of strong driver synapses: They innervate larger spines that were more likely to contain a spine apparatus, more often had embedded mitochondria, and more often contacted multiple targets. Finally, these inputs also demonstrated driver‐like functional properties: SPNs responded to optogenetic activation from S1 and M1 with large EPSP/Cs that depressed and were dependent on ionotropic but not metabotropic receptors. Together, our findings suggest that individual SPNs integrate driver input from multiple cortical areas with implications for how the basal ganglia relay cortical input to provide inhibitory innervation of motor thalamus.more » « less
-
Abstract Vocal learning in songbirds is mediated by a highly localized system of interconnected forebrain regions, including recurrent loops that traverse the cortex, basal ganglia, and thalamus. This brain-behavior system provides a powerful model for elucidating mechanisms of vocal learning, with implications for learning speech in human infants, as well as for advancing our understanding of skill learning in general. A long history of experiments in this area has tested neural responses to playback of different song stimuli in anesthetized birds at different stages of vocal development. These studies have demonstrated selectivity for different song types that provide neural signatures of learning. In contrast to the ease of obtaining responses to song playback in anesthetized birds, song-evoked responses in awake birds are greatly reduced or absent, indicating that behavioral state is an important determinant of neural responsivity. Song-evoked responses can be elicited during sleep as well as anesthesia, and the selectivity of responses to song playback in adult birds is highly similar between anesthetized and sleeping states, encouraging the idea that anesthesia and sleep are similar. In contrast to that idea, we report evidence that cortical responses to song playback in juvenile zebra finches ( Taeniopygia guttata ) differ greatly between sleep and urethane anesthesia. This finding indicates that behavioral states differ in sleep versus anesthesia and raises questions about relationships between developmental changes in sleep activity, selectivity for different song types, and the neural substrate for vocal learning.more » « less
-
Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.more » « less
An official website of the United States government

