While the development of proactive personal assistants has been a popular topic within AI research, most research in this direction tends to focus on a small subset of possible interaction settings. An important setting that is often overlooked is one where the users may have an incomplete or incorrect understanding of the task. This could lead to the user following incorrect plans with potentially disastrous consequences. Supporting such settings requires agents that are able to detect when the user's actions might be leading them to a possibly undesirable state and, if they are, intervene so the user can correct their course of actions. For the former problem, we introduce a novel planning compilation that transforms the task of estimating the likelihood of task failures into a probabilistic goal recognition problem. This allows us to leverage the existing goal recognition techniques to detect the likelihood of failure. For the intervention problem, we use model search algorithms to detect the set of minimal model updates that could help users identify valid plans. These identified model updates become the basis for agent intervention. We further extend the proposed approach by developing methods for pre-emptive interventions, to prevent the users from performing actions that might result in eventual plan failure. We show how we can identify such intervention points by using an efficient approximation of the true intervention problems, which are best represented as a Partially Observable Markov Decision-Process (POMDP). To substantiate our claims and demonstrate the applicability of our methodology, we have conducted exhaustive evaluations across a diverse range of planning benchmarks. These tests have consistently shown the robustness and adaptability of our approach, further solidifying its potential utility in real-world applications. 
                        more » 
                        « less   
                    This content will become publicly available on December 10, 2025
                            
                            Belief-State Query Policies for User-Aligned POMDPs
                        
                    
    
            Planning in real-world settings often entails addressing partial observability while aligning with users’ requirements. We present a novel framework for expressing users’ constraints and preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) policies in the setting of goal- oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment. Analysis proves that our algorithms converge to the optimal user-aligned behavior in the limit. Empirical results show that parameterized BSQ policies provide a computationally feasible approach for user-aligned planning in partially observable settings. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1942856
- PAR ID:
- 10616193
- Editor(s):
- Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C
- Publisher / Repository:
- 38th Conference on Neural Information Processing Systems
- Date Published:
- ISBN:
- 9798331314385
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)This work presents novel techniques for tightly integrated online information fusion and planning in human-autonomy teams operating in partially known environments. Motivated by dynamic target search problems, we present a new map-based sketch interface for online soft-hard data fusion. This interface lets human collaborators efficiently update map information and continuously build their own highly flexible ad hoc dictionaries for making language-based semantic observations, which can be actively exploited by autonomous agents in optimal search and information gathering problems. We formally link these capabilities to POMDP algorithms for optimal planning under uncertainty, and develop a new Dynamically Observable Monte Carlo planning (DOMCP) algorithm as an efficient means for updating online sampling-based planning policies for POMDPs with non-static observation models. DOMCP is validated on a small scale robot localization problem, and then demonstrated with our new user interface on a simulated dynamic target search scenario in a partially known outdoor environment.more » « less
- 
            This paper studies the synthesis of control policies for an agent that has to satisfy a temporal logic specification in a partially observable environment, in the presence of an adversary. The interaction of the agent (defender) with the adversary is modeled as a partially observable stochastic game. The search for policies is limited to over the space of finite state controllers, which leads to a tractable approach to determine policies. The goal is to generate a defender policy to maximize satisfaction of a given temporal logic specification under any adversary policy. We relate the satisfaction of the specification in terms of reaching (a subset of) recurrent states of a Markov chain. We then present a procedure to determine a set of defender and adversary finite state controllers of given sizes that will satisfy the temporal logic specification. We illustrate our approach with an example.more » « less
- 
            We study the problem of analyzing the effects of inconsistencies in perception, intent prediction, and decision making among interacting agents. When accounting for these effects, planning is akin to synthesizing policies in uncertain and potentially partially-observable environments. We consider the case where each agent, in an effort to avoid a difficult planning problem, does not consider the inconsistencies with other agents when computing its policy. In particular, each agent assumes that other agents compute their policies in the same way as it does, i.e., with the same objective and based on the same system model. While finding policies on the composed system model, which accounts for the agent interactions, scales exponentially, we efficiently provide quantifiable performance metrics in the form of deltas in the probability of satisfying a given specification. We showcase our approach using two realistic autonomous vehicle case-studies and implement it in an autonomous vehicle simulator.more » « less
- 
            We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
