skip to main content


Title: Undermatching Is a Consequence of Policy Compression

The matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch or bias choices toward the poorer option. Overmatching, or the tendency to bias choices toward the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which the policy of an agent is state dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity) can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, using mouse behavioral data (male), we validate a novel prediction about which task conditions exaggerate undermatching. Finally, in patients with Parkinson's disease (male and female), we argue that a reduction in undermatching with higher dopamine levels is consistent with an increased policy complexity.

SIGNIFICANCE STATEMENTThe matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option a yields twice as much reward as option b, matching states that agents will choose option a twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.

 
more » « less
NSF-PAR ID:
10407477
Author(s) / Creator(s):
;
Publisher / Repository:
DOI PREFIX: 10.1523
Date Published:
Journal Name:
The Journal of Neuroscience
Volume:
43
Issue:
3
ISSN:
0270-6474
Page Range / eLocation ID:
p. 447-457
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The survival of an organism is dependent on its ability to respond to cues in the environment. Such cues can attain control over behavior as a function of the value ascribed to them. Some individuals have an inherent tendency to attribute reward-paired cues with incentive motivational value, or incentive salience. For these individuals, termed sign-trackers, a discrete cue that precedes reward delivery becomes attractive and desirable in its own right. Prior work suggests that the behavior of sign-trackers is dopamine-dependent, and cue-elicited dopamine in the NAc is believed to encode the incentive value of reward cues. Here we exploited the temporal resolution of optogenetics to determine whether selective inhibition of ventral tegmental area (VTA) dopamine neurons during cue presentation attenuates the propensity to sign-track. Using male tyrosine hydroxylase(TH)-CreLong Evans rats, it was found that, under baseline conditions, ∼84% ofTH-Crerats tend to sign-track. Laser-induced inhibition of VTA dopamine neurons during cue presentation prevented the development of sign-tracking behavior, without affecting goal-tracking behavior. When laser inhibition was terminated, these same rats developed a sign-tracking response. Video analysis using DeepLabCutTMrevealed that, relative to rats that received laser inhibition, rats in the control group spent more time near the location of the reward cue even when it was not present and were more likely to orient toward and approach the cue during its presentation. These findings demonstrate that cue-elicited dopamine release is critical for the attribution of incentive salience to reward cues.

    SIGNIFICANCE STATEMENTActivity of dopamine neurons in the ventral tegmental area (VTA) during cue presentation is necessary for the development of a sign-tracking, but not a goal-tracking, conditioned response in a Pavlovian task. We capitalized on the temporal precision of optogenetics to pair cue presentation with inhibition of VTA dopamine neurons. A detailed behavioral analysis with DeepLabCutTMrevealed that cue-directed behaviors do not emerge without dopamine neuron activity in the VTA. Importantly, however, when optogenetic inhibition is lifted, cue-directed behaviors increase, and a sign-tracking response develops. These findings confirm the necessity of dopamine neuron activity in the VTA during cue presentation to encode the incentive value of reward cues.

     
    more » « less
  2. Abstract

    Emerald ash borer (EAB), a wood‐boring insect native to Asia, was discovered near Detroit in 2002 and has spread and killed millions of ash trees throughout the eastern United States and Canada. EAB causes severe damage in urban areas where it kills high‐value ash trees that shade streets, homes, and parks and costs homeowners and local governments millions of dollars for treatment, removal, and replacement of infested trees. We present a multistage, stochastic, mixed‐integer programming model to help decision‐makers maximize the public benefits of preserving healthy ash trees in an urban environment. The model allocates resources to surveillance of the ash population and subsequent treatment and removal of infested trees over time. We explore the multistage dynamics of an EAB outbreak with a dispersal mechanism and apply the optimization model to explore surveillance, treatment, and removal options to manage an EAB outbreak in Winnipeg, a city of Manitoba, Canada.

    Recommendation to Resource Managers

    Our approach demonstrates that timely detection and early response are critical factors for maximizing the number of healthy trees in urban areas affected by the pest outbreak.

    Treatment of the infested trees is most effective when done at the earliest stage of infestation. Treating asymptomatic trees at the earliest stages of infestation provides higher net benefits than tree removal or no‐treatment options.

    Our analysis suggests the use of branch sampling as a more accurate method than the use of sticky traps to detect the infested asymptomatic trees, which enables treating and removing more infested trees at the early stages of infestation.

    Our results also emphasize the importance of allocating a sufficient budget for tree removal to manage emerald ash borer infestations in urban environments. Tree removal becomes a less useful option in small‐budget solutions where the optimal policy is to spend most of the budget on treatments.

     
    more » « less
  3. Humans and other animals make decisions under uncertainty. Choosing an option that provides information can improve decision making. However, subjects often choose information that does not increase the chances of obtaining reward. In a procedure that promotes such paradoxical choice, animals choose between two alternatives: The richer option is followed by a cue that is rewarded 50% of the time (No-info) and the leaner option is followed by one of two cues, one always rewarded (100%), and the other never rewarded, 0% (Info). Since decisions involve comparing the subjective value of options after integrating all their features perhaps including information value, preference for information may rely on cortico-amygdalar circuitry. To test this, male and female Long-Evans rats were prepared with bilateral inhibitory DREADDs in the anterior cingulate cortex (ACC), orbitofrontal cortex (OFC), basolateral amygdala (BLA), or null virus infusions as a control. Using a counterbalanced design, we inhibited these regions after stable preference was acquired and during learning of new Info and No-info cues. We found that inhibition of ACC, but not OFC or BLA, selectively destabilized choice preference in female rats without affecting latency to choose or the response rate to cues. A logistic regression fit revealed that the previous choice strongly predicted preference in control animals, but not in female rats following ACC inhibition. BLA inhibition tended to decrease the learning of new cues that signaled the Info option, but had no effect on preference. The results reveal a causal, sex-dependent role for ACC in decisions involving information. 
    more » « less
  4. Abstract

    The current research investigates how people decide which of two options produces a better reward by repeatedly sampling from the options. In particular, it investigates the roles of two features of search, optional stopping and switch rate, on participants' final judgments of which option is better. First, in two studies, we found evidence for a new optional stopping effect; when participants stopped sampling right after experiencing a rare outcome, they made decisions as if they overweighted the rare outcome. Second, we investigated an effect proposed by Hills and Hertwig (2010) that people who frequently switch between options when sampling are more likely to make decisions consistent with underweighting rare outcomes. We conducted a theoretical analysis examining how switch rate can influence underweighting and how the type of decision problem moderates this effect. Informed by the theoretical analysis, we conducted four studies designed to test this effect with high power. None of the studies produced significant effects of switch rate. Lastly, the studies replicated a prior finding that optional stopping and switch rate are negatively correlated. In sum, this research elaborates a fuller understanding of the relation between search strategies (switch rate and optional stopping) on how people decide which option is better and their tendency to overweight versus underweight rare outcomes.

     
    more » « less
  5. Abstract

    Major depressive disorder (MDD) is a leading cause of disability worldwide. Individuals with MDD exhibit decreased motivation and deficits in reward processing. In a subset of MDD patients, chronic dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis occurs, resulting in increased levels of the ‘stress hormone’ cortisol during the normal rest period (i.e., evening and night). However, the mechanistic relationship between chronically elevated resting cortisol and behavioral deficits in motivation and reward processing remains unclear. Given that women are diagnosed with MDD at twice the rate of men, it is important to understand whether the mechanisms linking cortisol to the symptoms of MDD differ by sex. In this study, we used subcutaneous implants to chronically elevate free plasma corticosterone (the rodent homolog of cortisol; ‘CORT’) during the rest period in male and female mice and examined changes in behavior and dopamine system function. We found that chronic CORT treatment impaired motivated reward-seeking in both sexes. In female but not male mice, CORT treatment reduced dopamine content in the dorsomedial striatum (DMS). In male but not female mice, CORT treatment impaired the function of the dopamine transporter (DAT) in DMS. From these studies, we conclude that chronic CORT dysregulation impairs motivation by impairing dopaminergic transmission in the DMS, but via different mechanisms in male and female mice. A better understanding of these sex-specific mechanisms could lead to new directions in MDD diagnosis and treatment.

     
    more » « less