Standard temporal planning assumes that planning takes place offline, and then execution starts at time 0. Recently, situated temporal planning was introduced, where planning starts at time 0, and execution occurs after planning terminates. Situated temporal planning reflects a more realistic scenario where time passes during planning. However, in situated temporal planning a complete plan must be generated before any action is executed. In some problems with time pressure, timing is too tight to complete planning before the first action must be executed. For example, an autonomous car that has a truck backing towards it should probably move out of the way now, and plan how to get to its destination later. In this paper, we propose a new problem setting: concurrent planning and execution, in which actions can be dispatched (executed) before planning terminates. Unlike previous work on planning and execution, we must handle wall clock deadlines that affect action applicability and goal achievement (as in situated planning) while also supporting dispatching actions before a complete plan has been found. We extend previous work on metareasoning for situated temporal planning to develop an algorithm for this new setting. Our empirical evaluation shows that when there is strong time pressure, our approach outperforms situated temporal planning.
more »
« less
A Formal Metareasoning Model of Concurrent Planning and Execution
Agents that plan and act in the real world must deal with the fact that time passes as they are planning. When timing is tight, there may be insufficient time to complete the search for a plan before it is time to act. By commencing execution before search concludes, one gains time to search by making planning and execution concurrent. However, this incurs the risk of making incorrect action choices, especially if actions are irreversible. This tradeoff between opportunity and risk is the problem addressed in this paper. Our main contribution is to formally define this setting as an abstract metareasoning problem. We find that the abstract problem is intractable. However, we identify special cases that are solvable in polynomial time, develop greedy solution algorithms, and, through tests on instances derived from search problems, find several methods that achieve promising practical performance. This work lays the foundation for a principled time-aware executive that concurrently plans and executes.
more »
« less
- Award ID(s):
- 2008594
- PAR ID:
- 10451865
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 37
- Issue:
- 10
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 12427 to 12435
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Train routing is sensitive to delays that occur in the network. When a train is delayed, it is imperative that a new plan be found quickly, or else other trains may need to be stopped to ensure safety, potentially causing cascading delays. In this paper, we consider this class of multi-agent planning problems, which we call Multi-Agent Execution Delay Replanning. We show that these can be solved by reducing the problem to an any-start-time safe interval planning problem. When an agent has an any-start-time plan, it can react to a delay by simply looking up the precomputed plan for the delayed start time. We identify crucial real-world problem characteristics like the agent's speed, size, and safety envelope, and extend the any-start-time planning to account for them. Experimental results on real-world train networks show that any-start-time plans are compact and can be computed in reasonable time while enabling agents to instantly recover a safe plan.more » « less
-
The Multi-Agent Path Finding (MAPF) problem involves planning collision-free paths for multiple agents in a shared environment. The majority of MAPF solvers rely on the assumption that an agent can arrive at a specific location at a specific timestep. However, real-world execution uncertainties can cause agents to deviate from this assumption, leading to collisions and deadlocks. Prior research solves this problem by having agents follow a Temporal Plan Graph (TPG), enforcing a consistent passing order at every location as defined in the MAPF plan. However, we show that TPGs are overly strict because, in some circumstances, satisfying the passing order requires agents to wait unnecessarily, leading to longer execution time. To overcome this issue, we introduce a new graphical representation called a Bidirectional Temporal Plan Graph (BTPG), which allows switching passing orders during execution to avoid unnecessary waiting time. We design two anytime algorithms for constructing a BTPG: BTPG-naïve and BTPG-optimized. Experimental results show that following BTPGs consistently outperforms following TPGs, reducing unnecessary waits by 8-20%.more » « less
-
Large workloads of event trend aggregation queries are widely deployed to derive high-level insights about current event trends in near real time. To speed-up the execution, we identify and leverage sharing opportunities from complex patterns with flat Kleene operators or even nested Kleene expressions. We propose Gloria, a graph-based sharing optimizer for event trend aggregation. First, we map the sharing optimization problem to a graph path search problem in the Gloria graph with execution costs encoded as weights. Second, we shrink the search space by applying cost-driven pruning principles that guarantee optimality of the reduced Gloria graph in most cases. Lastly, we propose a path search algorithm that identifies the sharing plan with minimum execution costs. Our experimental study on three real-world data sets demonstrates that our Gloria optimizer effectively reduces the search space, leading to 5-fold speed-up in optimization time. The optimized plan consistently reduces the query latency by 68%-93% compared to the plan generated by state-of-the-art approaches.more » « less
-
Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.more » « less
An official website of the United States government

