skip to main content


Title: Approximate Distributed Spatiotemporal Topic Models for Multi-Robot Terrain Characterization
Unsupervised learning techniques, such as Bayesian topic models, are capable of discovering latent structure directly from raw data. These unsupervised models can endow robots with the ability to learn from their observations without human supervision, and then use the learned models for tasks such as autonomous exploration, adaptive sampling, or surveillance. This paper extends single-robot topic models to the domain of multiple robots. The main difficulty of this extension lies in achieving and maintaining global consensus among the unsupervised models learned locally by each robot. This is especially challenging for multi-robot teams operating in communication-constrained environments, such as marine robots. We present a novel approach for multi-robot distributed learning in which each robot maintains a local topic model to categorize its observations and model parameters are shared to achieve global consensus. We apply a combinatorial optimization procedure that combines local robot topic distributions into a globally consistent model based on topic similarity, which we find mitigates topic drift when compared to a baseline approach that matches topics naively. We evaluate our methods experimentally by demonstrating multi-robot underwater terrain characterization using simulated missions on real seabed imagery. Our proposed method achieves similar model quality under bandwidth-constraints to that achieved by models that continuously communicate, despite requiring less than one percent of the data transmission needed for continuous communication.  more » « less
Award ID(s):
1734400
NSF-PAR ID:
10082622
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ... IEEE/RSJ International Conference on Intelligent Robots and Systems
ISSN:
2153-0858
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Robot-mediated therapy is an emerging field of research seeking to improve therapy for children with Autism Spectrum Disorder (ASD). Current approaches to autonomous robot-mediated therapy often focus on having a robot teach a single skill to children with ASD and lack a personalized approach to each individual. More recently, Learning from Demonstration (LfD) approaches are being explored to teach socially assistive robots to deliver personalized interventions after they have been deployed but these approaches require large amounts of demonstrations and utilize learning models that cannot be easily interpreted. In this work, we present a LfD system capable of learning the delivery of autism therapies in a data-efficient manner utilizing learning models that are inherently interpretable. The LfD system learns a behavioral model of the task with minimal supervision via hierarchical clustering and then learns an interpretable policy to determine when to execute the learned behaviors. The system is able to learn from less than an hour of demonstrations and for each of its predictions can identify demonstrated instances that contributed to its decision. The system performs well under unsupervised conditions and achieves even better performance with a low-effort human correction process that is enabled by the interpretable model. 
    more » « less
  2. The Weighted-Mean Subsequence Reduced (W-MSR) algorithm, the state-of-the-art method for Byzantine-resilient design of decentralized multi-robot systems, is based on discarding outliers received over Linear Consensus Protocol (LCP). Although W-MSR provides theoretical guarantees relating network connectivity to the convergence of the underlying consensus, W-MSR comes with several limitations: the number of Byzantine robots, 𝐹 , to tolerate should be known a priori, each robot needs to maintain 2𝐹 + 1 neighbors, 𝐹 + 1 robots must independently make local measurements of the consensus property in order for the swarm’s decision to change, and W-MSR is specific to LCP and does not generalize to applications not implemented over LCP. In this work, we pro- pose a Decentralized Blocklist Protocol (DBP) based on inter-robot accusations. Accusations are made on the basis of locally-made observations of misbehavior, and once shared by cooperative robots across the network are used as input to a graph matching algorithm that computes a blocklist. DBP generalizes to applications not implemented via LCP, is adaptive to the number of Byzantine robots, and allows for fast information propagation through the multi- robot system while simultaneously reducing the required network connectivity relative to W-MSR. On LCP-type applications, DBP reduces the worst-case connectivity requirement of W-MSR from (2𝐹 + 1)-connected to (𝐹 + 1)-connected and the minimum number of cooperative observers required to propagate new information from 𝐹 + 1 to just 1 observer. We demonstrate that our approach to Byzantine resilience scales to hundreds of robots on target tracking, time synchronization, and localization case studies. 
    more » « less
  3. Multi-robot cooperative control has been extensively studied using model-based distributed control methods. However, such control methods rely on sensing and perception modules in a sequential pipeline design, and the separation of perception and controls may cause processing latencies and compounding errors that affect control performance. End-to-end learning overcomes this limitation by implementing direct learning from onboard sensing data, with control commands output to the robots. Challenges exist in end-to-end learning for multi-robot cooperative control, and previous results are not scalable. We propose in this article a novel decentralized cooperative control method for multi-robot formations using deep neural networks, in which inter-robot communication is modeled by a graph neural network (GNN). Our method takes LiDAR sensor data as input, and the control policy is learned from demonstrations that are provided by an expert controller for decentralized formation control. Although it is trained with a fixed number of robots, the learned control policy is scalable. Evaluation in a robot simulator demonstrates the triangular formation behavior of multi-robot teams of different sizes under the learned control policy.

     
    more » « less
  4. ABSTRACT Introduction

    Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator’s demonstration into subtasks and classifying the subtasks using sequence modeling.

    Materials and Methods

    A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform—SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset—one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations.

    Results

    The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%).

    Conclusions

    Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.

     
    more » « less
  5. In multi-robot systems, robots often gather data to improve the performance of their deep neural networks (DNNs) for perception and planning. Ideally, these robots should select the most informative samples from their local data distributions by employing active learning approaches. However, when the data collection is distributed among multiple robots, redundancy becomes an issue as different robots may select similar data points. To overcome this challenge, we propose a fleet active learning (FAL) framework in which robots collectively select informative data samples to enhance their DNN models. Our framework leverages submodular maximization techniques to prioritize the selection of samples with high information gain. Through an iterative algorithm, the robots coordinate their efforts to collectively select the most valuable samples while minimizing communication between robots. We provide a theoretical analysis of the performance of our proposed framework and show that it is able to approximate the NP-hard optimal solution. We demonstrate the effectiveness of our framework through experiments on real-world perception and classification datasets, which include autonomous driving datasets such as Berkeley DeepDrive. Our results show an improvement by up to 25.0% in classification accuracy, 9.2% in mean average precision and 48.5% in the submodular objective value compared to a completely distributed baseline. 
    more » « less