NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Discovery and Deployment of Emergent Robot Swarm Behaviors via Representation Learning and Real2Sim2Real Transfer

Mattson, Connor; Raveendra, Varun; Vega, Ricardo; Nowzari, Cameron; Drew, Daniel S; Brown, Daniel S (June 2025, AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems)

Given a swarm of limited-capability robots, we seek to automatically discover the set of possible emergent behaviors. Prior approaches to behavior discovery rely on human feedback or hand-crafted behavior metrics to represent and evolve behaviors and only discover behaviors in simulation, without testing or considering the deployment of these new behaviors on real robot swarms. In this work, we present Real2Sim2Real Behavior Discovery via Self-Supervised Representation Learning, which combines representation learning and novelty search to discover possible emergent behaviors automatically in simulation and enable direct controller transfer to real robots. First, we evaluate our method in simulation and show that our proposed self-supervised representation learning approach outperforms previous hand-crafted metrics by more accurately representing the space of possible emergent behaviors. Then, we address the reality gap by incorporating recent work in sim2real transfer for swarms into our lightweight simulator design, enabling direct robot deployment of all behaviors discovered in simulation on an open-source and low-cost robot platform.
more » « less
Free, publicly-accessible full text available June 5, 2026
Toward Zero-Shot User Intent Recognition in Shared Autonomy

Belsare, Atharv; Karimi, Zohre; Mattson, Connor; Brown, Daniel S (March 2025, HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction)

A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being able to assist the user. We propose and study a zero-shot, vision-only shared autonomy (VOSA) framework designed to allow robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks with unknown and dynamically changing object locations. To demonstrate the effectiveness of our VOSA framework, we instantiate a simple version of VOSA on a Kinova Gen3 manipulator and evaluate our system by conducting a user study on three tabletop manipulation tasks. The performance of VOSA matches that of an oracle baseline model that receives privileged knowledge of possible human intents while also requiring significantly less effort than unassisted teleoperation. In more realistic settings, where the set of possible human intents is fully or partially unknown, we demonstrate that VOSA requires less human effort and time than baseline approaches while being preferred by a majority of the participants. Our results demonstrate the efficacy and efficiency of using off-the-shelf vision algorithms to enable flexible and beneficial shared control of a robot manipulator. Code and videos available here: https://sites.google.com/view/zeroshot-sharedautonomy/home
more » « less
Free, publicly-accessible full text available March 4, 2026
Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?

Kalra, Akansha; Brown, Daniel S (August 2024, Reinforcement Learning Journal (RLJ))

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if these frameworks are actually aligned with human preferences. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We also provide experimental evidence that not only shows that reward DDTs can often achieve competitive RL performance when compared with larger capacity deep neural network reward functions but also demonstrates the diagnostic utility of our framework in checking alignment of learned reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards. Videos and code, are available at: https://sites.google.com/view/ddt-rlhf
more » « less
Full Text Available
Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations

Mattson, Connor; Aribandi, Anurag; Brown, Daniel S (August 2024, Reinforcement Learning Journal)

We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot a policy via human video demonstrations or teaching a robot to imitate a policy from another robot with a different embodiment. However, prior work has only focused on cases where near-optimal demonstrations are available, which is often difficult to ensure. By contrast, we study the setting of cross-embodiment reward learning from mixed-quality demonstrations. We demonstrate that prior work struggles to learn generalizable reward representations when learning from mixed-quality data. We then analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning. Our results give insight into how different representation learning techniques lead to qualitatively different reward shaping behaviors and the importance of human feedback when learning from mixed-quality, mixed-embodiment data.
more » « less
Full Text Available
Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

https://doi.org/10.1145/3610977.3634984

Trinh, Tu; Chen, Haoyu; Brown, Daniel S (March 2024, ACM)

Full Text Available
Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

https://doi.org/10.1109/MRS60187.2023.10416780

Mattson, Connor; Clark, Jeremy C; Brown, Daniel S (December 2023, IEEE)

Full Text Available
Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms

https://doi.org/10.1145/3583131.3590443

Mattson, Connor; Brown, Daniel S (July 2023, ACM)

Search for: All records