Realizing a multiagent system involves implementing member agents who interact based on a protocol while making decisions in a decentralized manner. Current programming models for agents offer poor abstractions for decision making and fail to adequately bridge an agent’s internal decision logic with its public decisions. We present Kiko, a protocol-based programming model for agents. To implement an agent, a programmer writes one or more decision makers, each of which chooses from among a set of valid decisions and makes mutually compatible decisions on what messages to send. By completely abstracting away the underlying communication service and by supporting practical decision-making patterns, Kiko enables agent developers to focus on business logic. We provide an operational semantics for Kiko and establish that Kiko agents are protocol compliant and able to realize any protocol enactment.
more »
« less
Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates
When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict.These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy.We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations.Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs.We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic.We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.
more »
« less
- Award ID(s):
- 2145291
- PAR ID:
- 10612074
- Publisher / Repository:
- AAAI
- Date Published:
- Journal Name:
- Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
- Volume:
- 7
- ISSN:
- 3065-8365
- Page Range / eLocation ID:
- 1368 to 1378
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper studies algorithmic decision-making in the presence of strategic individual behaviors, where an ML model is used to make decisions about human agents and the latter can adapt their behavior strategically to improve their future data. Existing results on strategic learning have largely focused on the linear setting where agents with linear labeling functions best respond to a (noisy) linear decision policy. Instead, this work focuses on general non-linear settings where agents respond to the decision policy with only "local information" of the policy. Moreover, we simultaneously consider the objectives of maximizing decision-maker welfare (model prediction accuracy), social welfare (agent improvement caused by strategic behaviors), and agent welfare (the extent that ML underestimates the agents). We first generalize the agent best response model in previous works to the non-linear setting, then reveal the compatibility of welfare objectives. We show the three welfare can attain the optimum simultaneously only under restrictive conditions which are challenging to achieve in non-linear settings. The theoretical results imply that existing works solely maximizing the welfare of a subset of parties inevitably diminish the welfare of the others. We thus claim the necessity of balancing the welfare of each party in non-linear settings and propose an irreducible optimization algorithm suitable for general strategic learning. Experiments on synthetic and real data validate the proposed algorithm.more » « less
-
Endriss, Ulle; Melo, Francisco (Ed.)Alternating-time temporal logic (ATL) extends branching time logic by enabling quantification over paths that result from the strategic choices made by multiple agents in various coalitions within the system. While classical temporal logics express properties of “closed” systems, ATL can express properties of “open” systems resulting from interactions among several agents. Reinforcement learning (RL) is a sampling-based approach to decision-making where learning agents, guided by a scalar reward function, discover optimal policies through repeated interactions with the environment. The challenge of translating high-level objectives into scalar rewards for RL has garnered increased interest, particularly following the success of model-free RL algorithms. This paper presents an approach for deploying model-free RL to verify multi-agent systems against ATL specifications. The key contribution of this paper is a verification procedure for model-free RL of quantitative and non-nested classic ATL properties, based on Q-learning, demonstrated on a natural subclass of non-nested ATL formulas.more » « less
-
vanBerkel, Kees; Ciabattoni, Agata; Horty, John (Ed.)Markov Decision Processes (MDPs) are the most common model for decision making under uncertainty in the Machine Learning community. An MDP captures nondeterminism, probabilistic uncertainty, and an explicit model of action. A Reinforcement Learning (RL) agent learns to act in an MDP by maximizing a utility function. This paper considers the problem of learning a decision policy that maximizes utility subject to satisfying a constraint expressed in deontic logic. In this setup, the utility captures the agent’s mission - such as going quickly from A to B. The deontic formula represents (ethical, social, situational) constraints on how the agent might achieve its mission by prohibiting classes of behaviors. We use the logic of Expected Act Utilitarianism, a probabilistic stit logic that can be interpreted over controlled MDPs. We develop a variation on policy improvement, and show that it reaches a constrained local maximum of the mission utility. Given that in stit logic, an agent’s duty is derived from value maximization, this can be seen as a way of acting to simultaneously maximize two value functions, one of which is implicit, in a bi-level structure. We illustrate these results with experiments on sample MDPs.more » « less
-
Dynamic max-min fair allocation (DMMF) is a simple and popular mechanism for the repeated allocation of a shared resource among competing agents: in each round, each agent can choose to request or not for the resource, which is then allocated to the requesting agent with the least number of allocations received till then. Recent work has shown that under DMMF, a simple threshold-based request policy enjoys surprisingly strong robustness properties, wherein each agent can realize a significant fraction of her optimal utility irrespective of how other agents' behave. While this goes some way in mitigating the possibility of a 'tragedy of the commons' outcome, the robust policies require that an agent defend against arbitrary (possibly adversarial) behavior by other agents. This however may be far from optimal compared to real world settings, where other agents are selfish optimizers rather than adversaries. Therefore, robust guarantees give no insight on how agents behave in an equilibrium, and whether outcomes are improved under one. Our work aims to bridge this gap by studying the existence and properties of equilibria under DMMF. To this end, we first show that despite the strong robustness guarantees of the threshold based strategies,no Nash equilibrium existswhen agents participate in DMMF, each using some fixed threshold-based policy. On the positive side, however, we show that for the symmetric case, a simple data-driven request policy guarantees that no agent benefits from deviating to a different fixed threshold policy. In our proposed policy agents aim to match the historical allocation rate with a vanishing drift towards the rate optimizing overall welfare for all users. Furthermore, the resulting equilibrium outcome can be significantly better compared to what follows from the robustness guarantees. Our results are built on a complete characterization of the steady-state distribution under DMMF, as well as new techniques for analyzing strategic agent outcomes under dynamic allocation mechanisms; we hope these may prove of independent interest in related problems.more » « less
An official website of the United States government

