skip to main content


Title: Iterative Program Synthesis for Adaptable Social Navigation
Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations – model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations.In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining pro-gram synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies.We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort.  more » « less
Award ID(s):
2046955
NSF-PAR ID:
10318551
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study self-supervised adaptation of a robot's policy for social interaction, i.e., a policy for active communication with surrounding pedestrians through audio or visual signals. Inspired by the observation that humans continually adapt their behavior when interacting under varying social context, we propose Adaptive EXP4 (A-EXP4), a novel online learning algorithm for adapting the robot-pedestrian interaction policy. To address limitations of bandit algorithms in adaptation to unseen and highly dynamic scenarios, we employ a mixture model over the policy parameter space. Specifically, a Dirichlet Process Gaussian Mixture Model (DPMM) is used to cluster the parameters of sampled policies and maintain a mixture model over the clusters, hence effectively discovering policies that are suitable to the current environmental context in an unsupervised manner. Our simulated and real-world experiments demonstrate the feasibility of A-EXP4 in accommodating interaction with different types of pedestrians while jointly minimizing social disruption through the adaptation process. While the A-EXP4 formulation is kept general for application in a variety of domains requiring continual adaptation of a robot's policy, we specifically evaluate the performance of our algorithm using a suitcase-inspired assistive robotic platform. In this concrete assistive scenario, the algorithm observes how audio signals produced by the navigational system affect the behavior of pedestrians and adapts accordingly. Consequently, we find A-EXP4 to effectively adapt the interaction policy for gently clearing a navigation path in crowded settings, resulting in significant reduction in empirical regret compared to the EXP4 baseline. 
    more » « less
  2. Consider an assistive system that guides visually impaired users through speech and haptic feedback to their destination. Existing robotic and ubiquitous navigation technologies (e.g., portable, ground, or wearable systems) often operate in a generic, user-agnostic manner. However, to minimize confusion and navigation errors, our real-world analysis reveals a crucial need to adapt theinstructional guidance across different end-users with diverse mobility skills. To address this practical issue in scalable system design, we propose a novel model based reinforcement learning framework for personalizing the system-user interaction experience. When incrementally adapting the system to new users, we propose to use a weighted experts model for addressing data-efficiency limitations in transfer learning with deep models. A real-world dataset of navigation by blind users is used to show that the proposed approach allows for (1) more accurate long-term human behavior prediction (up to 20 seconds into the future) through improved reasoning over personal mobility characteristics, interaction with surrounding obstacles, and the current navigation goal, and (2) quick adaptation at the onset of learning, when data is limited. 
    more » « less
  3. Autonomous mobile robots deployed in outdoor environments must reason about different types of terrain for both safety (e.g., prefer dirt over mud) and deployer preferences (e.g., prefer dirt path over flower beds). Most existing solutions to this preference-aware path planning problem use semantic segmentation to classify terrain types from camera images, and then ascribe costs to each type. Unfortunately, there are three key limitations of such approaches -- they 1) require pre-enumeration of the discrete terrain types, 2) are unable to handle hybrid terrain types (e.g., grassy dirt), and 3) require expensive labelled data to train visual semantic segmentation. We introduce Visual Representation Learning for Preference-Aware Path Planning (VRL-PAP), an alternative approach that overcomes all three limitations: VRL-PAP leverages unlabeled human demonstrations of navigation to autonomously generate triplets for learning visual representations of terrain that are viewpoint invariant and encode terrain types in a continuous representation space. The learned representations are then used along with the same unlabeled human navigation demonstrations to learn a mapping from the representation space to terrain costs. At run time, VRL-PAP maps from images to representations and then representations to costs to perform preference-aware path planning. We present empirical results from challenging outdoor settings that demonstrate VRL-PAP 1) is successfully able to pick paths that reflect demonstrated preferences, 2) is comparable in execution to geometric navigation with a highly detailed manually annotated map (without requiring such annotations), 3) is able to generalize to novel terrain types with minimal additional unlabeled demonstrations. 
    more » « less
  4. Reinforcement learning (RL) presents numerous benefits compared to rule-based approaches in various applications. Privacy concerns have grown with the widespread use of RL trained with privacy- sensitive data in IoT devices, especially for human-in-the-loop systems. On the one hand, RL methods enhance the user experience by trying to adapt to the highly dynamic nature of humans. On the other hand, trained policies can leak the user’s private information. Recent attention has been drawn to designing privacy-aware RL algorithms while maintaining an acceptable system utility. A central challenge in designing privacy-aware RL, especially for human-in-the-loop systems, is that humans have intrinsic variability, and their preferences and behavior evolve. The effect of one privacy leak mitigation can differ for the same human or across different humans over time. Hence, we can not design one fixed model for privacy-aware RL that fits all. To that end, we propose adaPARL, an adaptive approach for privacy-aware RL, especially for human-in-the-loop IoT systems. adaPARL provides a personalized privacy-utility trade-off depend- ing on human behavior and preference. We validate the proposed adaPARL on two IoT applications, namely (i) Human-in-the-Loop Smart Home and (ii) Human-in-the-Loop Virtual Reality (VR) Smart Classroom. Results obtained on these two applications validate the generality of adaPARL and its ability to provide a personalized privacy-utility trade-off. On average, adaPARL improves the utility by 57% while reducing the privacy leak by 23% on average. 
    more » « less
  5. null (Ed.)
    Across a wide variety of domains, artificial agents that can adapt and personalize to users have potential to improve and transform how social services are provided. Because of the need for personalized interaction data to drive this process, long-term (or longitudinal) interactions between users and agents, which unfold over a series of distinct interaction sessions, have attracted substantial research interest. In recognition of the expanded scope and structure of a long-term interaction, researchers are also adjusting the personalization models and algorithms used, orienting toward “continual learning” methods, which do not assume a stationary modeling target and explicitly account for the temporal context of training data. In parallel, researchers have also studied the effect of “multitask personalization,” an approach in which an agent interacts with users over multiple different tasks contexts throughout the course of a long-term interaction and learns personalized models of a user that are transferrable across these tasks. In this paper, we unite these two paradigms under the framework of “Lifelong Personalization,” analyzing the effect of multitask personalization applied to dynamic, non-stationary targets. We extend the multi-task personalization approach to the more complex and realistic scenario of modeling dynamic learners over time, focusing in particular on interactive scenarios in which the modeling agent plays an active role in teaching the student whose knowledge the agent is simultaneously attempting to model. Inspired by the way in which agents use active learning to select new training data based on domain context, we augment a Gaussian Process-based multitask personalization model with a mechanism to actively and continually manage its own training data, allowing a modeling agent to remove or reduce the weight of observed data from its training set, based on interactive context cues. We evaluate this method in a series of simulation experiments comparing different approaches to continual and multitask learning on simulated student data. We expect this method to substantially improve learning in Gaussian Process models in dynamic domains, establishing Gaussian Processes as another flexible modeling tool for Long-term Human-Robot Interaction (HRI) Studies. 
    more » « less