Dirichlet-Multinomial Counterfactual Rewards for Heterogeneous Multiagent Systems

Dixit, G.; Zerbel, N; Tumer, K.

Citation Details

Multi-robot teams have been shown to be effective in accomplishing complex tasks which require tight coordination among team members. In homogeneous systems, recent work has demonstrated that “stepping stone” rewards are an effective way to provide agents with feedback on potentially valuable actions even when the agent-to-agent coupling require- ments of an objective are not satisfied. In this work, we propose a new mechanism for inferring hypothetical partners in tightly-coupled, heterogeneous systems called Dirichlet-Multinomial Counterfactual Selection (DMCS). Using DMCS, we show that agents can learn to infer appropriate counterfactual partners to receive more informative stepping stone rewards by testing in a modified multi-rover exploration problem. We also show that DMCS outperforms a random partner selection baseline by over 40%, and we demonstrate how domain knowledge can be used to induce a prior to guide the agent learning process. Finally, we show that DMCS maintains superior performance for up to 15 distinct rover types compared to the performance of the baseline which degrades rapidly. more »

Award ID(s):: 1815886

PAR ID:: 10121068

Author(s) / Creator(s):: Dixit, G.; Zerbel, N; Tumer, K.

Date Published:: 2019-01-01

Journal Name:: IEEE Symposium on multi-robot and multi-agent systems

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this