skip to main content


Title: Robot Action Selection Learning via Layered Dimension Informed Program Synthesis
Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) – by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) enables tractable synthesis for robot action selection policies not possible with state of the art synthesis techniques, 3) requires two orders of magnitude fewer training examples than a comparable NN representation, and 4) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots.  more » « less
Award ID(s):
2006404
NSF-PAR ID:
10222642
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Conference on Robot Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) - by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) requires two orders of magnitude fewer training examples than a comparable NN representation, and 3) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots. 
    more » « less
  2. Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations – model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations.In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining pro-gram synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies.We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort. 
    more » « less
  3. Obtaining annotations for large training sets is expen- sive, especially in settings where domain knowledge is re- quired, such as behavior analysis. Weak supervision has been studied to reduce annotation costs by using weak la- bels from task-specific labeling functions (LFs) to augment ground truth labels. However, domain experts still need to hand-craft different LFs for different tasks, limiting scal- ability. To reduce expert effort, we present AutoSWAP: a framework for automatically synthesizing data-efficient task-level LFs. The key to our approach is to efficiently represent expert knowledge in a reusable domain-specific language and more general domain-level LFs, with which we use state-of-the-art program synthesis techniques and a small labeled dataset to generate task-level LFs. Addition- ally, we propose a novel structural diversity cost that allows for efficient synthesis of diverse sets of LFs, further improv- ing AutoSWAP’s performance. We evaluate AutoSWAP in three behavior analysis domains and demonstrate that Au- toSWAP outperforms existing approaches using only a frac- tion of the data. Our results suggest that AutoSWAP is an effective way to automatically generate LFs that can signif- icantly reduce expert effort for behavior analysis. 
    more » « less
  4. Over the past 50 years the diversity of higher education faculty in the mathematical, physical, computer, and engineering sciences (MPCES) has advanced very little at 4-year universities in the United States. This is despite laws and policies such as affirmative action, interventions by universities, and enormous financial investment by federal agencies to diversify science, technology, mathematics, and engineering (STEM) career pathways into academia. Data comparing the fraction of underrepresented minority (URM) postdoctoral scholars to the fraction of faculty at these institutions offer a straightforward empirical explanation for this state of affairs. URM postdoc appointments lag significantly behind progress in terms of both undergraduate and Ph.D.-level STEM student populations. Indeed, URM postdoc appointments lag well-behind faculty diversity itself in the MPCES fields, most of which draw their faculty heavily from the postdoctoral ranks, particularly at research-intensive (R1) universities. Thus, a sea-change in how postdocs are recruited, how their careers are developed, and how they are identified as potential faculty is required in order to diversify the nation’s faculty, and particularly the R1 MPCES professoriate. Our research shows that both Ph.D. students and postdocs benefit from intentional structure at various levels of their respective “apprentice” experiences, a factor that we believe has been neglected. Several key structural approaches are highly effective in these regards: (1) A collaborative approach in which leading research universities collectively identify outstanding URM candidates; (2) Faculty engagement in recruiting and supporting these postdocs; (3) Inter-institutional exchange programs to heighten the visibility and broaden the professional experiences of these postdocs; (4) Community-building activities that create a sense of belonging and encourage continuing in academia for each cohort; and (5) Continuing research based on outcomes and new experimental approaches. The California Alliance, consisting of UC Berkeley, UCLA, Caltech, and Stanford, has been engaged in such a program for almost a decade now, with most of the California Alliance URM postdocs now in tenure track positions or on the path toward careers as faculty at research intensive (R1) institutions. If this approach was brought to scale by involving the top 25 or so URM Ph.D.-producing R1 institutions in the MPCES fields, about 40% of the national URM postdoctoral population in these fields could be affected. Although this impact would fall short of bringing URM MPCES faculty ranks up to full representation of the United States population as a whole, it would vastly improve the outlook for URM students and their aspirations to take on leadership roles as scientists and engineers. 
    more » « less
  5. null (Ed.)
    Emerging Industrial Internet-of-Things systems require wireless solutions to connect sensors, actuators, and controllers as part of high data rate feedback-control loops over real-time flows. A key challenge is to provide predictable performance and agility in response to fluctuations in link quality, variable workloads, and topology changes. We propose WARP to address this challenge. WARP uses programs to specify a network’s behavior and includes a synthesis procedure to automatically generate such programs from a high-level specification of the system’s workload and topology. WARP has three unique features: (1) WARP uses a domain-specific language to specify stateful programs that include conditional statements to control when a flow’s packets are transmitted. The execution paths of programs depend on the pattern of packet losses observed at runtime, thereby enabling WARP to readily adapt to packet losses due to short-term variations in link quality. (2) Our synthesis technique uses heuristics to improve network performance by considering multiple packet loss patterns and associated execution paths when determining the transmissions performed by nodes. Furthermore, the generated programs ensure that the likelihood of a flow delivering its packets by its deadline exceeds a user-specified threshold. (3) WARP can adapt to workload and topology changes without explicitly reconstructing a network’s program based on the observation that nodes can independently synthesize the same program when they share the same workload and topology information. Simulations show that WARP improves network throughput for data collection, dissemination, and mixed workloads on two realistic topologies. Testbed experiments show that WARP reduces the time to add new flows by 5 times over a state-of-the-art centralized control plane and guarantees the real-time and reliability of all flows. 
    more » « less