skip to main content

Search for: All records

Award ID contains: 1947418

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available October 1, 2024
  2. Generalization problem of reinforcement learning is crucial especially for dynamic environments. Conventional reinforcement learning methods solve the problems with some ideal assumptions and are difficult to be applied in dynamic environments directly. In this paper, we propose a new multi-virtual- agent reinforcement learning (MVARL) approach for a predator-prey grid game. The designed method can find the optimal solution even when the predator moves. Specifically, we design virtual agents to interact with simulated changing environments in parallel instead of using actual agents. Moreover, a global agent learns information from these virtual agents and interacts with the actual environment at the same time. This method can not only effectively improve the generalization performance of reinforcement learning in dynamic environments, but also reduce the overall computational cost. Two simulation studies are considered in this paper to validate the effectiveness of the designed method. We also compare the results with the conventional reinforcement learning methods. The results indicate that our proposed method can improve the robustness of reinforcement learning method and contribute to the generalization to certain extent. 
    more » « less
  3. null (Ed.)
    In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios. 
    more » « less
  4. This paper proposes an intelligent multi-agent approach in a real-time strategy game, StarCraft, based on the deep deterministic policy gradients (DDPG) techniques. An actor and a critic network are established to estimate the optimal control actions and corresponding value functions, respectively. A special reward function is designed based on the agents' own condition and enemies' information to help agents make intelligent control in the game. Furthermore, in order to accelerate the learning process, the transfer learning techniques are integrated into the training process. Specifically, the agents are trained initially in a simple task to learn the basic concept for the combat, such as detouring moving, avoiding and joining attacking. Then, we transfer this experience to the target task with a complex and difficult scenario. From the experiment, it is shown that our proposed algorithm with transfer learning can achieve better performance. 
    more » « less