skip to main content


Title: Robot Action Selection Learning via Layered Dimension Informed Program Synthesis
Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) - by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) requires two orders of magnitude fewer training examples than a comparable NN representation, and 3) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots.  more » « less
Award ID(s):
2102291
NSF-PAR ID:
10220883
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Conference on Robot Learning (CoRL)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) – by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) enables tractable synthesis for robot action selection policies not possible with state of the art synthesis techniques, 3) requires two orders of magnitude fewer training examples than a comparable NN representation, and 4) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots. 
    more » « less
  2. Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations – model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations.In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining pro-gram synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies.We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort. 
    more » « less
  3. Abstract

    Human–exoskeleton interactions have the potential to bring about changes in human behavior for physical rehabilitation or skill augmentation. Despite significant advances in the design and control of these robots, their application to human training remains limited. The key obstacles to the design of such training paradigms are the prediction of human–exoskeleton interaction effects and the selection of interaction control to affect human behavior. In this article, we present a method to elucidate behavioral changes in the human–exoskeleton system and identify expert behaviors correlated with a task goal. Specifically, we observe the joint coordinations of the robot, also referred to as kinematic coordination behaviors, that emerge from human–exoskeleton interaction during learning. We demonstrate the use of kinematic coordination behaviors with two task domains through a set of three human-subject studies. We find that participants (1) learn novel tasks within the exoskeleton environment, (2) demonstrate similarity of coordination during successful movements within participants, (3) learn to leverage these coordination behaviors to maximize success within participants, and (4) tend to converge to similar coordinations for a given task strategy across participants. At a high level, we identify task-specific joint coordinations that are used by different experts for a given task goal. These coordinations can be quantified by observing experts and the similarity to these coordinations can act as a measure of learning over the course of training for novices. The observed expert coordinations may further be used in the design of adaptive robot interactions aimed at teaching a participant the expert behaviors.

     
    more » « less
  4. Over the past 50 years the diversity of higher education faculty in the mathematical, physical, computer, and engineering sciences (MPCES) has advanced very little at 4-year universities in the United States. This is despite laws and policies such as affirmative action, interventions by universities, and enormous financial investment by federal agencies to diversify science, technology, mathematics, and engineering (STEM) career pathways into academia. Data comparing the fraction of underrepresented minority (URM) postdoctoral scholars to the fraction of faculty at these institutions offer a straightforward empirical explanation for this state of affairs. URM postdoc appointments lag significantly behind progress in terms of both undergraduate and Ph.D.-level STEM student populations. Indeed, URM postdoc appointments lag well-behind faculty diversity itself in the MPCES fields, most of which draw their faculty heavily from the postdoctoral ranks, particularly at research-intensive (R1) universities. Thus, a sea-change in how postdocs are recruited, how their careers are developed, and how they are identified as potential faculty is required in order to diversify the nation’s faculty, and particularly the R1 MPCES professoriate. Our research shows that both Ph.D. students and postdocs benefit from intentional structure at various levels of their respective “apprentice” experiences, a factor that we believe has been neglected. Several key structural approaches are highly effective in these regards: (1) A collaborative approach in which leading research universities collectively identify outstanding URM candidates; (2) Faculty engagement in recruiting and supporting these postdocs; (3) Inter-institutional exchange programs to heighten the visibility and broaden the professional experiences of these postdocs; (4) Community-building activities that create a sense of belonging and encourage continuing in academia for each cohort; and (5) Continuing research based on outcomes and new experimental approaches. The California Alliance, consisting of UC Berkeley, UCLA, Caltech, and Stanford, has been engaged in such a program for almost a decade now, with most of the California Alliance URM postdocs now in tenure track positions or on the path toward careers as faculty at research intensive (R1) institutions. If this approach was brought to scale by involving the top 25 or so URM Ph.D.-producing R1 institutions in the MPCES fields, about 40% of the national URM postdoctoral population in these fields could be affected. Although this impact would fall short of bringing URM MPCES faculty ranks up to full representation of the United States population as a whole, it would vastly improve the outlook for URM students and their aspirations to take on leadership roles as scientists and engineers. 
    more » « less
  5. A fundamental challenge in automated reasoning about programming assignments at scale is clustering student submissions based on their underlying algorithms. State-of-the-art clustering techniques are sensitive to control structure variations, cannot cluster buggy solutions with similar correct solutions, and either require expensive pair-wise program analyses or training efforts. We propose a novel technique that can cluster small imperative programs based on their algorithmic essence: (A) how the input space is partitioned into equivalence classes and (B) how the problem is uniquely addressed within individual equivalence classes. We capture these algorithmic aspects as two quantitative semantic program features that are merged into a program's vector representation. Programs are then clustered using their vector representations. The computation of our first semantic feature leverages model counting to identify the number of inputs belonging to an input equivalence class. The computation of our second semantic feature abstracts the program's data flow by tracking the number of occurrences of a unique pair of consecutive values of a variable during its lifetime. The comprehensive evaluation of our tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that SemCluster (1) generates far fewer clusters than other clustering techniques, (2) precisely identifies distinct solution strategies, and (3) boosts the performance of clustering-based program repair, all within a reasonable amount of time. 
    more » « less