Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents’ policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/. 
                        more » 
                        « less   
                    This content will become publicly available on March 1, 2026
                            
                            Stable In-Hand Manipulation With Finger-Specific Multi-Agent Shadow Critic Consensus and Information Sharing
                        
                    
    
            Deep Reinforcement Learning (DRL) has shown its capability to solve the high degrees of freedom in control and the complex interaction with the object in the multi-finger dexterous in-hand manipulation tasks. Current DRL approaches lack behavior constraints during the learning process, leading to aggressive and unstable policies that are insufficient for safety-critical in-hand manipulation tasks. The centralized learning strategy also limits the flexibility to fine-tune each robot finger's behavior. This work proposes the Finger-specific Multi-agent Shadow Critic Consensus (FMSC) method, which models the in-hand manipulation as a multi-agent collaboration task where each finger is an individual agent and trains the policies for the fingers to achieve a consensus across the critic networks through the Information Sharing (IS) across the neighboring agents and finger-specific stable manipulation objectives based on the state-action occupancy measure, a general utility of DRL that is approximated during the learning process. The methods are evaluated in two in-hand manipulation tasks on the Shadow Hand. The results show that FMSC+IS converges faster in training, achieving a comparable success rate and much better manipulation stability than conventional DRL methods. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2426469
- PAR ID:
- 10608647
- Editor(s):
- Jara, C; Borras_Sol, J
- Publisher / Repository:
- IEEE Robotics and Automation Letters ( Volume: 10, Issue: 3, March 2025)
- Date Published:
- Journal Name:
- IEEE Robotics and Automation Letters
- Volume:
- 10
- Issue:
- 3
- ISSN:
- 2377-3774
- Page Range / eLocation ID:
- 2407 to 2413
- Subject(s) / Keyword(s):
- Dexterous manipulation in-hand manipulation multi-fingered hands
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.more » « less
- 
            Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents’ choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.more » « less
- 
            The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial two-finger-gripper robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to a target commercial robot.more » « less
- 
            We consider the problem of spectrum sharing by multiple cellular operators. We propose a novel deep Reinforcement Learning (DRL)-based distributed power allocation scheme which utilizes the multi-agent Deep Deterministic Policy Gradient (MA-DDPG) algorithm. In particular, we model the base stations (BSs) that belong to the multiple operators sharing the same band, as DRL agents that simultaneously determine the transmit powers to their scheduled user equipment (UE) in a synchronized manner. The power decision of each BS is based on its own observation of the radio environment (RF) environment, which consists of interference measurements reported from the UEs it serves, and a limited amount of information obtained from other BSs. One advantage of the proposed scheme is that it addresses the single-agent non-stationarity problem of RL in the multi-agent scenario by incorporating the actions and observations of other BSs into each BS's own critic which helps it to gain a more accurate perception of the overall RF environment. A centralized-training-distributed-execution framework is used to train the policies where the critics are trained over the joint actions and observations of all BSs while the actor of each BS only takes the local observation as input in order to produce the transmit power. Simulation with the 6 GHz Unlicensed National Information Infrastructure (U-NII)-5 band shows that the proposed power allocation scheme can achieve better throughput performance than several state-of-the-art approaches.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
