skip to main content

This content will become publicly available on December 13, 2024

Title: Oblivious Markov Decision Processes: Planning and Policy Execution
We examine a novel setting in which two parties have partial knowledge of the elements that make up a Markov Decision Process (MDP) and must cooperate to compute and execute an optimal policy for the problem constructed from those elements. This situation arises when one party wants to give a robot some task, but does not wish to divulge those details to a second party-while the second party possesses sensitive data about the robot's dynamics (information needed for planning). Both parties want the robot to perform the task successfully, but neither is willing to disclose any more information than is absolutely necessary. We utilize techniques from secure multi-party computation, combining primitives and algorithms to construct protocols that can compute an optimal policy while ensuring that the policy remains opaque by being split across both parties. To execute a split policy, we also give a protocol that enables the robot to determine what actions to trigger, while the second party guards against attempts to probe for information inconsistent with the policy's prescribed execution. In order to improve scalability, we find that basis functions and constraint sampling methods are useful in forming effective approximate MDPs. We report simulation results examining performance and precision, and assess the scaling properties of our Python implementation. We also describe a hardware proof-of-feasibility implementation using inexpensive physical robots, which, being a small-scale instance, can be solved directly.  more » « less
Award ID(s):
2034123 2024733
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Date Published:
Page Range / eLocation ID:
3850 to 3857
Medium: X
Singapore, Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. Using the context of human-supervised object collection tasks, we explore policies for a robot to seek assistance from a human supervisor and avoid loss of human trust in the robot. We consider a human-robot interaction scenario in which a mobile manipulator chooses to collect objects either autonomously or through human assistance; while the human supervisor monitors the robot’s operation, assists when asked, or intervenes if the human perceives that the robot may not accomplish its goal. We design an optimal assistance-seeking policy for the robot using a Partially Observable Markov Decision Process (POMDP) setting in which human trust is a hidden state and the objective is to maximize collaborative performance. We conduct two sets of human-robot interaction experiments. The data from the first set of experiments is used to estimate POMDP parameters, which are used to compute an optimal assistance-seeking policy that is used in the second experiment. For most participants, the estimated POMDP reveals that humans are more likely to intervene when their trust is low and the robot is performing a high-complexity task; and that the robot asking for assistance in high-complexity tasks can increase human trust in the robot. Our experimental results show that the proposed trust-aware policy yields superior performance compared with an optimal trust-agnostic policy. 
    more » « less
  2. We describe a new paradigm for multi-party private set intersection cardinality (PSI-CA) that allows $n$ parties to compute the intersection size of their datasets without revealing any additional information. We explore a variety of instantiations of this paradigm. By operating under the assumption that a particular subset of parties refrains from collusion, our protocols avoid computationally expensive public-key operations and are secure in the presence of a semi-honest adversary. We demonstrate the practicality of our PSI-CA with an implementation. For $n=16$ parties with data-sets of $2^{20}$ items each, our server-aided variant takes 71 seconds. Interestingly, in the server-less setting, the same task takes only 7 seconds. To the best of our knowledge, this is the first `special purpose' implementation of a multi-party PSI-CA from symmetric-key techniques (i.e. an implementation that does not rely on a generic underlying MPC).We study two interesting applications -- heatmap computation and associated rule learning (ARL) -- that can be computed securely using a dot-product as a building block. We analyse the performance of securely computing heatmap and ARL using our protocol and compare that to the state-of-the-art.

    more » « less
  3. Abstract Private set intersection (PSI) allows two mutually distrusting parties each with a set as input, to learn the intersection of both their sets without revealing anything more about their respective input sets. Traditionally, PSI studies the static setting where the computation is performed only once on both parties’ input sets. We initiate the study of updatable private set intersection (UPSI), which allows parties to compute the intersection of their private sets on a regular basis with sets that also constantly get updated. We consider two specific settings. In the first setting called UPSI with addition , parties can add new elements to their old sets. We construct two protocols in this setting, one allowing both parties to learn the output and the other only allowing one party to learn the output. In the second setting called UPSI with weak deletion , parties can additionally delete their old elements every t days. We present a protocol for this setting allowing both parties to learn the output. All our protocols are secure against semi-honest adversaries and have the guarantee that both the computational and communication complexity only grow with the set updates instead of the entire sets. Finally, we implement our UPSI with addition protocols and compare with the state-of-the-art PSI protocols. Our protocols compare favorably when the total set size is sufficiently large, the new updates are sufficiently small, or in networks with low bandwidth. 
    more » « less
  4. Private matching for compute (PMC) establishes a match between two datasets owned by mutually distrusted parties (C and P) and allows the parties to input more data for the matched records for arbitrary downstream secure computation without rerunning the private matching component. The state-of-the-art PMC protocols only support two parties and assume that both parties can participate in computationally intensive secure computation. We observe that such operational overhead limits the adoption of these protocols to solely powerful entities as small data owners or devices with minimal computing power will not be able to participate. We introduce two protocols to delegate PMC from party P to untrusted cloud servers, called delegates, allowing multiple smaller P parties to provide inputs containing identifiers and associated values. Our Delegated Private Matching for Compute protocols, called DPMC and DsPMC, establish a join between the datasets of party C and multiple delegators P based on multiple identifiers and compute secret shares of associated values for the identifiers that the parties have in common. We introduce a rerandomizable encrypted oblivious pseudorandom function (OPRF) primitive, called EO, which allows two parties to encrypt, mask, and shuffle their data. Note that EO may be of independent interest. Our DsPMC protocol limits the leakages of DPMC by combining our EO scheme and secure three-party shuffling. Finally, our implementation demonstrates the efficiency of our constructions by outperforming related works by approximately 10x for the total protocol execution and by at least 20x for the computation on the delegators.

    more » « less
  5. Selecting appropriate tutoring help actions that account for both a student’s content mastery and engagement level is essential for effective human tutors, indicating the critical need for these skills in autonomous tutors. In this work, we formulate the robot-student tutoring help action selection problem as the Assistive Tutor partially observable Markov decision process (AT-POMDP). We designed the AT-POMDP and derived its parameters based on data from a prior robot-student tutoring study. The policy that results from solving the ATPOMDP allows a robot tutor to decide upon the optimal tutoring help action to give a student, while maintaining a belief of the student’s mastery of the material and engagement with the task. This approach is validated through a between-subjects field study, which involved 4th grade students (n = 28) interacting with a social robot solving long division problems over five sessions. Students who received help from a robot using the AT-POMDP policy demonstrated significantly greater learning gains than students who received help from a robot with a fixed help action selection policy. Our results demonstrate that this robust computational framework can be used effectively to deliver diverse and personalized tutoring support over time for students. 
    more » « less