skip to main content


Title: Observational Learning with Negative Externalities
Observational learning models seek to understand how distributed agents learn from observing the actions of others. In the basic model, agents seek to choose between two alternatives, where the underlying value of each alternative is the same for each agent. Agents do not know this value but only observe a noisy signal of the value and make their decision based on this signal and observations of other agents’ actions. Here, instead we consider a scenario in which the choices faced by an agent exhibit a negative externality so that value of a choice may decrease depending on the history of other agents selecting that choice. We study the learning behavior of Bayesian agents with such an externality and show that this can lead to very different outcomes compared to models without such an externality.  more » « less
Award ID(s):
1908807
NSF-PAR ID:
10391544
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2022 IEEE International Symposium on Information Theory
Page Range / eLocation ID:
1495 to 1496
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of allocating divisible items among multiple agents, and consider the setting where any agent is allowed to introduce {\emph diversity constraints} on the items they are allocated. We motivate this via settings where the items themselves correspond to user ad slots or task workers with attributes such as race and gender on which the principal seeks to achieve demographic parity. We consider the following question: When an agent expresses diversity constraints into an allocation rule, is the allocation of other agents hurt significantly? If this happens, the cost of introducing such constraints is disproportionately borne by agents who do not benefit from diversity. We codify this via two desiderata capturing {\em robustness}. These are {\emph no negative externality} -- other agents are not hurt -- and {\emph monotonicity} -- the agent enforcing the constraint does not see a large increase in value. We show in a formal sense that the Nash Welfare rule that maximizes product of agent values is {\emph uniquely} positioned to be robust when diversity constraints are introduced, while almost all other natural allocation rules fail this criterion. We also show that the guarantees achieved by Nash Welfare are nearly optimal within a widely studied class of allocation rules. We finally perform an empirical simulation on real-world data that models ad allocations to show that this gap between Nash Welfare and other rules persists in the wild. 
    more » « less
  2. We consider an InterDependent Security (IDS) game with networked agents and positive externality where each agent chooses an effort/investment level for securing itself. The agents are interdependent in that the state of security of one agent depends not only on its own investment but also on the other agents' effort/investment. Due to the positive externality, the agents under-invest in security which leads to an inefficient Nash equilibrium (NE). While much has been analyzed in the literature on the under-investment issue, in this study we take a different angle. Specifically, we consider the possibility of allowing agents to pool their resources, i.e., allowing agents to have the ability to both invest in themselves as well as in other agents. We show that the interaction of strategic and selfish agents under resource pooling (RP) improves the agents' effort/investment level as well as their utility as compared to a scenario without resource pooling. We show that the social welfare (total utility) at the NE of the game with resource pooling is higher than the maximum social welfare attainable in a game without resource pooling but by using an optimal incentive mechanism. Furthermore, we show that while voluntary participation in this latter scenario is not generally true, it is guaranteed under resource pooling. 
    more » « less
  3. Abstract

    When human adults make decisions (e.g., wearing a seat belt), we often consider the negative consequences that would ensue if our actions were to fail, even if we have never experienced such a failure. Do the same considerations guide our understanding of other people's decisions? In this paper, we investigated whether adults, who have many years of experience making such decisions, and 6‐ and 7‐year‐old children, who have less experience and are demonstrably worse at judging the consequences of their own actions, conceive others' actions as motivated both by reward (how good reaching one's intended goal would be), and by what we call “danger” (how badly one's action could end). In two pre‐registered experiments, we tested whether adults and 6‐ and 7‐year‐old children tailor their predictions and explanations of an agent's action choices to the specific degree of danger and reward entailed by each action. Across four different tasks, we found that children and adults expected others to negatively appraise dangerous situations and minimize the danger of their actions. Children's and adults' judgments varied systematically in accord with both the degree of danger the agent faced and the value the agent placed on the goal state it aimed to achieve. However, children did not calibrate their inferences abouthow muchan agent valued the goal state of a successful action in accord with the degree of danger the action entailed, and adults calibrated these inferences more weakly than inferences concerning the agent's future action choices. These results suggest that from childhood, people use a degree of danger and reward to make quantitative, fine‐grained explanations and predictions about other people's behavior, consistent with computational models on theory of mind that contain continuous representations of other agents' action plans.

     
    more » « less
  4. Abstract

    Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. This study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned with the true probabilities of the positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18–25) and adolescents (ages 13–17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden-agent intervention, those of children (ages 7–12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment, they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.

     
    more » « less
  5. In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's longterm rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm-an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination-and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guar-antee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data-past environment interactions-when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination. 
    more » « less