skip to main content

Search for: All records

Award ID contains: 1947419

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Generalization problem of reinforcement learning is crucial especially for dynamic environments. Conventional reinforcement learning methods solve the problems with some ideal assumptions and are difficult to be applied in dynamic environments directly. In this paper, we propose a new multi-virtual- agent reinforcement learning (MVARL) approach for a predator-prey grid game. The designed method can find the optimal solution even when the predator moves. Specifically, we design virtual agents to interact with simulated changing environments in parallel instead of using actual agents. Moreover, a global agent learns information from these virtual agents and interacts with the actual environment at the same time. This method can not only effectively improve the generalization performance of reinforcement learning in dynamic environments, but also reduce the overall computational cost. Two simulation studies are considered in this paper to validate the effectiveness of the designed method. We also compare the results with the conventional reinforcement learning methods. The results indicate that our proposed method can improve the robustness of reinforcement learning method and contribute to the generalization to certain extent. 
    more » « less
  2. null (Ed.)
    In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios. 
    more » « less
  3. null (Ed.)