skip to main content


Title: Incentive-Compatible Mechanisms for Norm Monitoring in Open Multi-Agent Systems
We consider the problem of detecting norm violations in open multi-agent systems (MAS). We show how, using ideas from scrip systems, we can design mechanisms where the agents comprising the MAS are incentivised to monitor the actions of other agents for norm violations. The cost of providing the incentives is not borne by the MAS and does not come from fines charged for norm violations (fines may be impossible to levy in a system where agents are free to leave and rejoin again under a different identity). Instead, monitoring incentives come from (scrip) fees for accessing the services provided by the MAS. In some cases, perfect monitoring (and hence enforcement) can be achieved: no norms will be violated in equilibrium. In other cases, we show that, while it is impossible to achieve perfect enforcement, we can get arbitrarily close; we can make the probability of a norm violation in equilibrium arbitrarily small. We show using simulations that our theoretical results, which apply to systems with a large number of agents, hold for multi-agent systems with as few as 1000 agents–the system rapidly converges to the steady-state distribution of scrip tokens necessary to ensure monitoring and then remains close to the steady state.  more » « less
Award ID(s):
0534064 0812045 0911036 1214844
NSF-PAR ID:
10093618
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of Artificial Intelligence Research
Volume:
62
ISSN:
1076-9757
Page Range / eLocation ID:
433 to 458
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of detecting norm violations in open multi-agent systems (MAS). We show how, using ideas from \emph{scrip systems}, we can design mechanisms where the agents comprising the MAS are incentivised to monitor the actions of other agents for norm violations. The cost of providing the incentives is not borne by the MAS and does not come from fines charged for norm violations (fines may be impossible to levy in a system where agents are free to leave and rejoin again under a different identity). Instead, monitoring incentives come from (scrip) fees for accessing the services provided by the MAS. In some cases, perfect monitoring (and hence enforcement) can be achieved: no norms will be violated in equilibrium. In other cases, we show that, while it is impossible to achieve perfect enforcement, we can get arbitrarily close; we can make the probability of a norm violation in equilibrium arbitrarily small. We show using simulations that our theoretical results, which apply to systems with a large number of agents, hold for multi-agent systems with as few as 1000 agents---the system rapidly converges to the steady-state distribution of scrip tokens necessary to ensure monitoring and then remains close to the steady state. 
    more » « less
  2. In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the Lp norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL. 
    more » « less
  3. This work investigates local and global measures of disorder in large-scale directed networks of double-integrator systems connected over a multi-dimensional torus. We quantify these performance measures in systems subjected to distributed disturbances using an H2 norm with outputs corresponding to local state errors or deviations from the global average. We consider two directed uni-directional state feedback inter- connections that correspond to relative position and relative velocity feedback in vehicle network applications. Our main result reveals that absolute state feedback plays a critical role in system robustness when local state measurements are uni- directional. Specifically, if absolute measurements of either state variable are available, then systems with uni-directional relative feedback perform as well as their symmetric bi-directional counterparts but have the advantage of reduced communication requirements. However in the absence of absolute feedback their performance is worse; in fact, it is impossible to maintain stability (i.e. a finite H2 norm) with uni-directional state mea- surements for arbitrarily large networks. Numerical examples illustrate the theory. 
    more » « less
  4. Multi-agent systems provide a basis for developing systems of autonomous entities and thus find application in a variety of domains. We consider a setting where not only the member agents are adaptive but also the multi-agent system viewed as an entity in its own right is adaptive. Specifically, the social structure of a multi-agent system can be reflected in the social norms among its members. It is well recognized that the norms that arise in society are not always beneficial to its members. We focus on prosocial norms, which help achieve positive outcomes for society and often provide guidance to agents to act in a manner that takes into account the welfare of others. Specifically, we propose Cha, a framework for the emergence of prosocial norms. Unlike previous norm emergence approaches, Cha supports continual change to a system (agents may enter and leave) and dynamism (norms may change when the environment changes). Importantly, Cha agents incorporate prosocial decision-making based on inequity aversion theory, reflecting an intuition of guilt arising from being antisocial. In this manner, Cha brings together two important themes in prosociality: decision-making by individuals and fairness of system-level outcomes. We demonstrate via simulation that Cha can improve aggregate societal gains and fairness of outcomes. 
    more » « less
  5. We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n . In light of this negative result, we propose a new algorithm for which the i -th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i 's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret). 
    more » « less