We introduce a sequential Bayesian binary hypothesis testing problem under social learning, termed selfish learning, where agents work to maximize their individual rewards. In particular, each agent receives a private signal and is aware of decisions made by earlier-acting agents. Beside inferring the underlying hypothesis, agents also decide whether to stop and declare, or pass the inference to the next agent. The employer rewards only correct responses and the reward per worker decreases with the number of employees used for decision making. We characterize decision regions of agents in the infinite and finite horizon. In particular, we show that the decision boundaries in the infinite horizon are the solutions to a Markov Decision Process with discounted costs, and can be solved using value iteration. In the finite horizon, we show that team performance is enhanced upon appropriate incentivization when compared to sequential social learning.
more »
« less
Probability Reweighting in Social Learning: Optimality and Suboptimality
This work explores sequential Bayesian binary hypothesis testing in the social learning setup under expertise diversity. We consider a two-agent (say advisor-learner) sequential binary hypothesis test where the learner infers the hypothesis based on the decision of the advisor, a prior private signal, and individual belief. In addition, the agents have varying expertise, in terms of the noise variance in the private signal. Under such a setting, we first investigate the behavior of optimal agent beliefs and observe that the nature of optimal agents could be inverted depending on expertise levels. We also discuss suboptimality of the Prelec reweighting function under diverse expertise. Next, we consider an advisor selection problem wherein the belief of the learner is fixed and the advisor is to be chosen for a given prior. We characterize the decision region for choosing such an advisor and argue that a learner with beliefs varying from the true prior often ends up selecting a suboptimal advisor.
more »
« less
- Award ID(s):
- 1717530
- PAR ID:
- 10059999
- Date Published:
- Journal Name:
- Proceedings of the 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Page Range / eLocation ID:
- 6966-6970
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider sequential stochastic decision problems in which, at each time instant, an agent optimizes its local utility by solving a stochastic program and, subsequently, announces its decision to the world. Given this action, we study the problem of estimating the agent’s private belief (i.e., its posterior distribution over the set of states of nature based on its private observations). We demonstrate that it is possible to determine the set of private beliefs that are consistent with public data by leveraging techniques from inverse optimization. We further give a number of useful characterizations of this set; for example, tight bounds by solving a set of linear programs (under concave utility). As an illustrative example, we consider estimating the private belief of an investor in regime-switching portfolio allocation. Finally, our theoretical results are illustrated and evaluated in numerical simulations.more » « less
-
Networks that provide agents with access to a common database of the agents' actions enable an agent to easily learn by observing the actions of others, but are also susceptible to manipulation by “fake” agents. Prior work has studied a model for the impact of such fake agents on ordinary (rational) agents in a sequential Bayesian observational learning framework. That model assumes that ordinary agents do not have an ex-ante bias in their actions and that they follow their private information in case of an ex-post tie between actions. This paper builds on that work to study the effect of fake agents on the welfare obtained by ordinary agents under different ex-ante biases and different tie-breaking rules. We show that varying either of these can lead to cases where, unlike in the prior work, the addition of fake agents leads to a gain in welfare. This implies that in such cases, if fake agents are absent or are not adequately present, an altruistic platform could artificially introduce fake actions to effect improved learning.more » « less
-
This paper addresses incomplete-information dynamic games, where reward parameters of agents are private. Previous studies have shown that online belief update is necessary for deriving equilibrial policies of such games, especially for high-risk games such as vehicle interactions. However, updating beliefs in real time is computationally expensive as it requires continuous computation of Nash equilibria of the sub-games starting from the current states. In this paper, we consider the triggering mechanism of belief update as a policy defined on the agents’ physical and belief states, and propose learning this policy through reinforcement learning (RL). Using a two-vehicle uncontrolled intersection case, we show that intermittent belief update via RL is sufficient for safe interactions, reducing the computation cost of updates by 59% when agents have full observations of physical states. Simulation results also show that the belief update frequency will increase as noise becomes more significant in measurements of the vehicle positions.more » « less
-
We consider information design in spatial resource competition, motivated by ride sharing platforms sharing information with drivers about rider demand. Each of N co-located agents (drivers) decides whether to move to another location with an uncertain and possibly higher resource level (rider demand), where the utility for moving increases in the resource level and decreases in the number of other agents that move. A principal who can observe the resource level wishes to share this information in a way that ensures a welfare-maximizing number of agents move. Analyzing the principal’s information design problem using the Bayesian persuasion framework, we study both private signaling mechanisms, where the principal sends personalized signals to each agent, and public signaling mechanisms, where the principal sends the same information to all agents. We show: 1) For private signaling, computing the optimal mechanism using the standard approach leads to a linear program with 2 N variables, rendering the computation challenging. We instead describe a computationally efficient two-step approach to finding the optimal private signaling mechanism. First, we perform a change of variables to solve a linear program with O(N^2) variables that provides the marginal probabilities of recommending each agent move. Second, we describe an efficient sampling procedure over sets of agents consistent with these optimal marginal probabilities; the optimal private mechanism then asks the sampled set of agents to move and the rest to stay. 2) For public signaling, we first show the welfare-maximizing equilibrium given any common belief has a threshold structure. Using this, we show that the optimal public mechanism with respect to the sender-preferred equilibrium can be computed in polynomial time. 3) We support our analytical results with numerical computations that show the optimal private and public signaling mechanisms achieve substantially higher social welfare when compared with no-information and full-information benchmarks.more » « less
An official website of the United States government

