- Award ID(s):
- 1728412
- PAR ID:
- 10105332
- Date Published:
- Journal Name:
- Annual American Control Conference
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Task and motion planning subject to linear temporal logic (LTL) specifications in complex, dynamic environments requires efficient exploration of many possible future worlds. model‐free reinforcement learning has proven successful in a number of challenging tasks, but shows poor performance on tasks that require long‐term planning. in this work, we integrate Monte Carlo tree search with hierarchical neural net policies trained on expressive LTL specifications. we use reinforcement learning to find deep neural networks representing both low‐level control policies and task‐level ``option policies'' that achieve high‐level goals. our combined architecture generates safe and responsive motion plans that respect theLTL constraints. we demonstrate our approach in a simulated autonomous driving setting, where a vehicle must drive down a road in traffic, avoid collisions, and navigate an intersection, all while obeying rules of the road.more » « less
-
Robots acting in human-scale environments must plan under uncertainty in large state–action spaces and face constantly changing reward functions as requirements and goals change. Planning under uncertainty in large state–action spaces requires hierarchical abstraction for efficient computation. We introduce a new hierarchical planning framework called Abstract Markov Decision Processes (AMDPs) that can plan in a fraction of the time needed for complex decision making in ordinary MDPs. AMDPs provide abstract states, actions, and transition dynamics in multiple layers above a base-level “flat” MDP. AMDPs decompose problems into a series of subtasks with both local reward and local transition functions used to create policies for subtasks. The resulting hierarchical planning method is independently optimal at each level of abstraction, and is recursively optimal when the local reward and transition functions are correct. We present empirical results showing significantly improved planning speed, while maintaining solution quality, in the Taxi domain and in a mobile-manipulation robotics problem. Furthermore, our approach allows specification of a decision-making model for a mobile-manipulation problem on a Turtlebot, spanning from low-level control actions operating on continuous variables all the way up through high-level object manipulation tasks.more » « less
-
Robots acting in human-scale environments must plan under uncertainty in large state–action spaces and face constantly changing reward functions as requirements and goals change. Planning under uncertainty in large state–action spaces requires hierarchical abstraction for efficient computation. We introduce a new hierarchical planning framework called Abstract Markov Decision Processes (AMDPs) that can plan in a fraction of the time needed for complex decision making in ordinary MDPs. AMDPs provide abstract states, actions, and transition dynamics in multiple layers above a base-level “flat” MDP. AMDPs decompose problems into a series of subtasks with both local reward and local transition functions used to create policies for subtasks. The resulting hierarchical planning method is independently optimal at each level of abstraction, and is recursively optimal when the local reward and transition functions are correct. We present empirical results showing significantly improved planning speed, while maintaining solution quality, in the Taxi domain and in a mobile-manipulation robotics problem. Furthermore, our approach allows specification of a decision-making model for a mobile-manipulation problem on a Turtlebot, spanning from low-level control actions operating on continuous variables all the way up through high-level object manipulation tasks.more » « less
-
This study proposes a hierarchically integrated framework for safe task and motion planning (TAMP) of bipedal locomotion in a partially observable environment with dynamic obstacles and uneven terrain. The high-level task planner employs linear temporal logic for a reactive game synthesis between the robot and its environment and provides a formal guarantee on navigation safety and task completion. To address environmental partial observability, a belief abstraction model is designed by partitioning the environment into multiple belief regions and employed at the high-level navigation planner to estimate the dynamic obstacles' location. This additional location information of dynamic obstacles offered by belief abstraction enables less conservative long-horizon navigation actions beyond guaranteeing immediate collision avoidance. Accordingly, a synthesized action planner sends a set of locomotion actions to the middle-level motion planner while incorporating safe locomotion specifications extracted from safety theorems based on a reduced-order model (ROM) of the locomotion process. The motion planner employs the ROM to design safety criteria and a sampling algorithm to generate nonperiodic motion plans that accurately track high-level actions. At the low level, a foot placement controller based on an angular-momentum linear inverted pendulum model is implemented and integrated with an ankle-actuated passivity-based controller for full-body trajectory tracking. To address external perturbations, this study also investigates the safe sequential composition of the keyframe locomotion state and achieves robust transitions against external perturbations through reachability analysis. The overall TAMP framework is validated with extensive simulations and hardware experiments on bipedal walking robots Cassie and Digit designed by Agility Robotics.more » « less
-
This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disjunction from prioritized qualitative choice logic. This language allows for succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system's behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.