skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Model-based prioritization for acquiring protection
Protection often involves the capacity to prospectively plan the actions needed to mitigate harm. The computational architecture of decisions involving protection remains unclear, as well as whether these decisions differ from other beneficial prospective actions such as reward acquisition. Here we compare protection acquisition to reward acquisition and punishment avoidance to examine overlapping and distinct features across the three action types. Protection acquisition is positively valenced similar to reward. For both protection and reward, the more the actor gains, the more benefit. However, reward and protection occur in different contexts, with protection existing in aversive contexts. Punishment avoidance also occurs in aversive contexts, but differs from protection because punishment is negatively valenced and motivates avoidance. Across three independent studies (Total N = 600) we applied computational modeling to examine model-based reinforcement learning for protection, reward, and punishment in humans. Decisions motivated by acquiring protection evoked a higher degree of model-based control than acquiring reward or avoiding punishment, with no significant differences in learning rate. The context-valence asymmetry characteristic of protection increased deployment of flexible decision strategies, suggesting model-based control depends on the context in which outcomes are encountered as well as the valence of the outcome.  more » « less
Award ID(s):
2203522
PAR ID:
10400014
Author(s) / Creator(s):
; ;
Editor(s):
Cai, Ming Bo
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
18
Issue:
12
ISSN:
1553-7358
Page Range / eLocation ID:
e1010805
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reinforcement learning (RL) is mechanized to learn from experience. It solves the problem in sequential decisions by optimizing reward-punishment through experimentation of the distinct actions in an environment. Unlike supervised learning models, RL lacks static input-output mappings and the objective of minimization of a vector error. However, to find out an optimal strategy, it is crucial to learn both continuous feedback from training data and the offline rules of the experiences with no explicit dependence on online samples. In this paper, we present a study of a multi-agent RL framework which involves a Critic in semi-offline mode criticizing over an online Actor-Critic network, namely, Critic-over-Actor-Critic (CoAC) model, in finding optimal treatment plan of ICU patients as well as optimal strategy in a combative battle game. For further validation, we also examine the model in the adversarial assignment. 
    more » « less
  2. Abstract—The functional dichotomy of anatomical regions of the medial prefrontal cortex (mPFC) has been tested with greater certainty in punishment-driven tasks, and less so in reward-oriented paradigms. In the infralimbic cortex (IL), known for behavioral suppression (STOP), tasks linked with reward or punishment are encoded through firing rate decrease or increase, respectively. Although the ventral tegmental area (VTA) is the brain region governing reward/aversion learning, the link between its excitatory neuron population and IL encoding of reward-linked behavioral expression is unclear. Here, we present evidence that IL ensembles use a population-based mechanism involving broad inhibition of principal cells at intervals when reward is presented or expected. The IL encoding mechanism was consistent across multiple sessions with randomized rewarded target sites. Most IL neurons exhibit FR (Firing Rate) suppression during reward acquisition intervals (T1), and subsequent exploration of previously rewarded targets when the reward is omitted (T2). Furthermore, FR suppression in putative IL ensembles persisted for intervals that followed reward-linked target events. Pairing VTA glutamate inhibition with reward acquisition events reduced the weight of reward-target association expressed as a lower affinity for previously rewarded targets. For these intervals, fewer IL neurons per mouse trial showed FR decrease and were accompanied by an increase in the percentage of units with no change in FR. Together, we conclude that VTA glutamate neurons are likely involved in establishing IL inhibition states that encode reward acquisition, and subsequent reward-target association. 
    more » « less
  3. Across animal species, dopamine-operated memory systems comprise anatomically segregated, functionally diverse subsystems. Although individual subsystems could operate independently to support distinct types of memory, the logical interplay between subsystems is expected to enable more complex memory processing by allowing existing memory to influence future learning. Recent comprehensive ultrastructural analysis of theDrosophilamushroom body revealed intricate networks interconnecting the dopamine subsystems—the mushroom body compartments. Here, we review the functions of some of these connections that are beginning to be understood. Memory consolidation is mediated by two different forms of network: A recurrent feedback loop within a compartment maintains sustained dopamine activity required for consolidation, whereas feed-forward connections across compartments allow short-term memory formation in one compartment to open the gate for long-term memory formation in another compartment. Extinction and reversal of aversive memory rely on a similar feed-forward circuit motif that signals omission of punishment as a reward, which triggers plasticity that counteracts the original aversive memory trace. Finally, indirect feed-forward connections from a long-term memory compartment to short-term memory compartments mediate higher-order conditioning. Collectively, these emerging studies indicate that feedback control and hierarchical connectivity allow the dopamine subsystems to work cooperatively to support diverse and complex forms of learning. 
    more » « less
  4. The automation of extracting argument structures faces a pair of challenges on (1) encoding long-term contexts to facilitate comprehensive understanding, and (2) improving data efficiency since constructing high-quality argument structures is time-consuming. In this work, we propose a novel context-aware Transformer-based argument structure prediction model which, on five different domains, significantly outperforms models that rely on features or only encode limited contexts. To tackle the difficulty of data annotation, we examine two complementary methods: (i) transfer learning to leverage existing annotated data to boost model performance in a new target domain, and (ii) active learning to strategically identify a small amount of samples for annotation. We further propose model-independent sample acquisition strategies, which can be generalized to diverse domains. With extensive experiments, we show that our simple-yet-effective acquisition strategies yield competitive results against three strong comparisons. Combined with transfer learning, substantial F1 score boost (5-25) can be further achieved during the early iterations of active learning across domains. 
    more » « less
  5. null (Ed.)
    Abstract Designers make information acquisition decisions, such as where to search and when to stop the search. Such decisions are typically made sequentially, such that at every search step designers gain information by learning about the design space. However, when designers begin acquiring information, their decisions are primarily based on their prior knowledge. Prior knowledge influences the initial set of assumptions that designers use to learn about the design space. These assumptions are collectively termed as inductive biases. Identifying such biases can help us better understand how designers use their prior knowledge to solve problems in the light of uncertainty. Thus, in this study, we identify inductive biases in humans in sequential information acquisition tasks. To do so, we analyze experimental data from a set of behavioral experiments conducted in the past [1–5]. All of these experiments were designed to study various factors that influence sequential information acquisition behaviors. Across these studies, we identify similar decision making behaviors in the participants in their very first decision to “choose x”. We find that their choices of “x” are not uniformly distributed in the design space. Since such experiments are abstractions of real design scenarios, it implies that further contextualization of such experiments would only increase the influence of these biases. Thus, we highlight the need to study the influence of such biases to better understand designer behaviors. We conclude that in the context of Bayesian modeling of designers’ behaviors, utilizing the identified inductive biases would enable us to better model designer’s priors for design search contexts as compared to using non-informative priors. 
    more » « less