skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 5, 2026

Title: Discovery and Deployment of Emergent Robot Swarm Behaviors via Representation Learning and Real2Sim2Real Transfer
Given a swarm of limited-capability robots, we seek to automatically discover the set of possible emergent behaviors. Prior approaches to behavior discovery rely on human feedback or hand-crafted behavior metrics to represent and evolve behaviors and only discover behaviors in simulation, without testing or considering the deployment of these new behaviors on real robot swarms. In this work, we present Real2Sim2Real Behavior Discovery via Self-Supervised Representation Learning, which combines representation learning and novelty search to discover possible emergent behaviors automatically in simulation and enable direct controller transfer to real robots. First, we evaluate our method in simulation and show that our proposed self-supervised representation learning approach outperforms previous hand-crafted metrics by more accurately representing the space of possible emergent behaviors. Then, we address the reality gap by incorporating recent work in sim2real transfer for swarms into our lightweight simulator design, enabling direct robot deployment of all behaviors discovered in simulation on an open-source and low-cost robot platform.  more » « less
Award ID(s):
2310759
PAR ID:
10609066
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors. 
    more » « less
  2. Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. 
    more » « less
  3. When a mobile robot is deployed in a field environment, e.g., during a disaster response application, the capability of adapting its navigational behaviors to unstructured terrains is essential for effective and safe robot navigation. In this paper, we introduce a novel joint terrain representation and apprenticeship learning approach to implement robot adaptation to unstructured terrains. Different from conventional learning-based adaptation techniques, our approach provides a unified problem formulation that integrates representation and apprenticeship learning under a unified regularized optimization framework, instead of treating them as separate and independent procedures. Our approach also has the capability to automatically identify discriminative feature modalities, which can improve the robustness of robot adaptation. In addition, we implement a new optimization algorithm to solve the formulated problem, which provides a theoretical guarantee to converge to the global optimal solution. In the experiments, we extensively evaluate the proposed approach in real-world scenarios, in which a mobile robot navigates on familiar and unfamiliar unstructured terrains. Experimental results have shown that the proposed approach is able to transfer human expertise to robots with small errors, achieve superior performance compared with previous and baseline methods, and provide intuitive insights on the importance of terrain feature modalities. 
    more » « less
  4. null (Ed.)
    Complex service robotics scenarios entail unpredictable task appearance both in space and time. This requires robots to continuously relocate and imposes a trade-off between motion costs and efficiency in task execution. In such scenarios, multi-robot systems and even swarms of robots can be exploited to service different areas in parallel. An efficient deployment needs to continuously determine the best allocation according to the actual service needs, while also taking relocation costs into account when such allocation must be modified. For large scale problems, centrally predicting optimal allocations and movement paths for each robot quickly becomes infeasible. Instead, decentralized solutions are needed that allow the robotic system to self-organize and adaptively respond to the task demands. In this paper, we propose a distributed and asynchronous approach to simultaneous task assignment and path planning for robot swarms, which combines a bio-inspired collective decision-making process for the allocation of robots to areas to be serviced, and a search-based path planning approach for the actual routing of robots towards tasks to be executed. Task allocation exploits a hierarchical representation of the workspace, supporting the robot deployment to the areas that mostly require service. We investigate four realistic environments of increasing complexity, where each task requires a robot to reach a location and work for a specific amount of time. The proposed approach improves over two different baseline algorithms in specific settings with statistical significance, while showing consistently good results overall. Moreover, the proposed solution is robust to limited communication and robot failures. 
    more » « less
  5. Robot swarms have, to date, been constructed from artificial materials. Motile biological constructs have been created from muscle cells grown on precisely shaped scaffolds. However, the exploitation of emergent self-organization and functional plasticity into a self-directed living machine has remained a major challenge. We report here a method for generation of in vitro biological robots from frog ( Xenopus laevis ) cells. These xenobots exhibit coordinated locomotion via cilia present on their surface. These cilia arise through normal tissue patterning and do not require complicated construction methods or genomic editing, making production amenable to high-throughput projects. The biological robots arise by cellular self-organization and do not require scaffolds or microprinting; the amphibian cells are highly amenable to surgical, genetic, chemical, and optical stimulation during the self-assembly process. We show that the xenobots can navigate aqueous environments in diverse ways, heal after damage, and show emergent group behaviors. We constructed a computational model to predict useful collective behaviors that can be elicited from a xenobot swarm. In addition, we provide proof of principle for a writable molecular memory using a photoconvertible protein that can record exposure to a specific wavelength of light. Together, these results introduce a platform that can be used to study many aspects of self-assembly, swarm behavior, and synthetic bioengineering, as well as provide versatile, soft-body living machines for numerous practical applications in biomedicine and the environment. 
    more » « less