NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents

Han, Shuo; Espinosa, German; Huang, Junda; Dombeck, Daniel A; MacIver, Malcolm A; Stadie, Bradly C (May 2025, arXivorg)

Recent advances in reinforcement learning (RL) have demonstrated impressive capabilities in complex decision-making tasks. This progress raises a natural question: how do these artificial systems compare to biological agents, which have been shaped by millions of years of evolution? To help answer this question, we undertake a comparative study of biological mice and RL agents in a predator-avoidance maze environment. Through this analysis, we identify a striking disparity: RL agents consistently demonstrate a lack of self-preservation instinct, readily risking ``death'' for marginal efficiency gains. These risk-taking strategies are in contrast to biological agents, which exhibit sophisticated risk-assessment and avoidance behaviors. Towards bridging this gap between the biological and artificial, we propose two novel mechanisms that encourage more naturalistic risk-avoidance behaviors in RL agents. Our approach leads to the emergence of naturalistic behaviors, including strategic environment assessment, cautious path planning, and predator avoidance patterns that closely mirror those observed in biological systems.
more » « less
Free, publicly-accessible full text available May 18, 2026
Adaptive Incentive Design for Markov Decision Processes with Unknown Rewards

MA, Haoxiang; Han, Shuo; Hemida, Ahmed; Kamhoua, Charles; Fu, Jie (March 2025, Transactions on machine learning research)

Free, publicly-accessible full text available March 28, 2026
Adaptive Incentive Design for Markov Decision Processes with Unknown Rewards

Ma, Haoxiang; Han, Shuo; Hemida, Ahmed; Kamhoua, Charles A; Fu, Jie (March 2025, Transactions on Machine Learning Research)
Poupart, Pascal (Ed.)
Incentive design, also known as model design or environment design for Markov decision processes(MDPs), refers to a class of problems in which a leader can incentivize his follower by modifying the follower's reward function, in anticipation that the follower's optimal policy in the resulting MDP can be desirable for the leader's objective. In this work, we propose gradient-ascent algorithms to compute the leader's optimal incentive design, despite the lack of knowledge about the follower's reward function. First, we formulate the incentive design problem as a bi-level optimization problem and demonstrate that, by the softmax temporal consistency between the follower's policy and value function, the bi-level optimization problem can be reduced to single-level optimization, for which a gradient-based algorithm can be developed to optimize the leader's objective. We establish several key properties of incentive design in MDPs and prove the convergence of the proposed gradient-based method. Next, we show that the gradient terms can be estimated from observations of the follower's best response policy, enabling the use of a stochastic gradient-ascent algorithm to compute a locally optimal incentive design without knowing or learning the follower's reward function. Finally, we analyze the conditions under which an incentive design remains optimal for two different rewards which are policy invariant. The effectiveness of the proposed algorithm is demonstrated using a small probabilistic transition system and a stochastic gridworld.
more » « less
Free, publicly-accessible full text available March 28, 2026
DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing

https://doi.org/10.21437/Interspeech.2024-2180

Yuan, Kuang; Han, Shuo; Kumar, Swarun; Raj, Bhiksha (September 2024, ISCA)

Full Text Available
Active Perception With Initial-State Uncertainty: A Policy Gradient Method

https://doi.org/10.1109/LCSYS.2024.3513896

Shi, Chongyang; Han, Shuo; Dorothy, Michael; Fu, Jie (January 2024, IEEE Control Systems Letters)

Full Text Available
Optimal Resource Allocation for Proactive Defense with Deception in Probabilistic Attack Graphs

https://doi.org/10.1007/978-3-031-50670-3_11

Ma, Haoxiang; Han, Shuo; Kamhoua, Charles; Fu, Jie (December 2023, Decision and Game Theory for Security)
Fu, J. (Ed.)
Full Text Available
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Han, Songyang; Su, Sanbao; He, Sihong; Han, Shuo; Yang, Haizhao; Zou, Shaofeng; Miao, Fei (February 2024, Transactions on machine learning research)

Full Text Available
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Han, Songyang; Su, Sanbao; He, Sihong; Han, Shuo; Yang, Haizhao; Zou, Shaofeng; Miao, Fei (February 2024, Transactions on Machine Learning Research)

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents’ policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
more » « less
Full Text Available
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Han, Songyang; Su, Sanbao; He, Sihong; Han, Shuo; Yang, Haizhao; Zou, Shaofeng; Miao, Fei (February 2024, Transactions on machine learning research)

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
more » « less
Full Text Available
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Han, Songyang; Su, Sanbao; He, Sihong; Han, Shuo; Yang, Haizhao; Zou, Shaofeng; Miao, Fei (January 2024, Transactions on Machine Learning Research)

Full Text Available

« Prev Next »

Search for: All records