Reward learning as a method for inferring human intent and preferences has been studied extensively. Prior approaches make an implicit assumption that the human maintains a correct belief about the robot's domain dynamics. However, this may not always hold since the human's belief may be biased, which can ultimately lead to a misguided estimation of the human's intent and preferences, which is often derived from human feedback on the robot's behaviors. In this paper, we remove this restrictive assumption by considering that the human may have an inaccurate understanding of the robot. We propose a method called Generalized Reward Learning with biased beliefs about domain dynamics (GeReL) to infer both the reward function and human's belief about the robot in a Bayesian setting based on human ratings. Due to the complex forms of the posteriors, we formulate it as a variational inference problem to infer the posteriors of the parameters that govern the reward function and human's belief about the robot simultaneously. We evaluate our method in a simulated domain and with a user study where the user has a bias based on the robot's appearances. The results show that our method can recover the true human preferences while subject to such biased beliefs, in contrast to prior approaches that could have misinterpreted them completely.
more »
« less
Belief-prefix Control for Autonomously Dodging Switching Disturbances
This paper presents a new method of controller synthesis for hidden mode switched systems, where the disturbances are the quantities that are affected by the unobserved switches. Rather than using model discrimination techniques that rely on modifying desired control actions to achieve identification, the controller uses consistency sets which map the measured external behaviors to a belief about which mode signal is being executed and a control action. This hybrid controller is a prefix-based controller, where the prefixes come from an offline constructed belief graph that incorporates prior information about switching sequences with potential reachable sets of the dynamics. While the mode signal is hidden to the controller, the system’s location on the belief graph is fully observed and allows for this problem to be transformed into a design problem in which a discrete mode, in terms of beliefs, is directly observed. Finally, it is shown that affine controllers dependent on prefixes of such beliefs can be synthesized via linear programming.
more »
« less
- Award ID(s):
- 1553873
- PAR ID:
- 10211272
- Date Published:
- Journal Name:
- European Control Conference
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Robots working in human environments often encounter a wide range of articulated objects, such as tools, cabinets, and other jointed objects. Such articulated objects can take an infinite number of possible poses, as a point in a potentially high-dimensional continuous space. A robot must perceive this continuous pose to manipulate the object to a desired pose. This problem of perception and manipulation of articulated objects remains a challenge due to its high dimensionality and multimodal uncertainty. Here, we describe a factored approach to estimate the poses of articulated objects using an efficient approach to nonparametric belief propagation. We consider inputs as geometrical models with articulation constraints and observed RGBD (red, green, blue, and depth) sensor data. The described framework produces object-part pose beliefs iteratively. The problem is formulated as a pairwise Markov random field (MRF), where each hidden node (continuous pose variable) is an observed object-part’s pose and the edges denote the articulation constraints between the parts. We describe articulated pose estimation by a “pull” message passing algorithm for nonparametric belief propagation (PMPNBP) and evaluate its convergence properties over scenes with articulated objects. Robot experiments are provided to demonstrate the necessity of maintaining beliefs to perform goal-driven manipulation tasks.more » « less
-
Recent years have seen a surge in research on why people fall for misinformation and what can be done about it. Drawing on a framework that conceptualizes truth judgments of true and false information as a signal-detection problem, the current article identifies three inaccurate assumptions in the public and scientific discourse about misinformation: (1) People are bad at discerning true from false information, (2) partisan bias is not a driving force in judgments of misinformation, and (3) gullibility to false information is the main factor underlying inaccurate beliefs. Counter to these assumptions, we argue that (1) people are quite good at discerning true from false information, (2) partisan bias in responses to true and false information is pervasive and strong, and (3) skepticism against belief-incongruent true information is much more pronounced than gullibility to belief-congruent false information. These conclusions have significant implications for person-centered misinformation interventions to tackle inaccurate beliefs.more » « less
-
This work explores sequential Bayesian binary hypothesis testing in the social learning setup under expertise diversity. We consider a two-agent (say advisor-learner) sequential binary hypothesis test where the learner infers the hypothesis based on the decision of the advisor, a prior private signal, and individual belief. In addition, the agents have varying expertise, in terms of the noise variance in the private signal. Under such a setting, we first investigate the behavior of optimal agent beliefs and observe that the nature of optimal agents could be inverted depending on expertise levels. We also discuss suboptimality of the Prelec reweighting function under diverse expertise. Next, we consider an advisor selection problem wherein the belief of the learner is fixed and the advisor is to be chosen for a given prior. We characterize the decision region for choosing such an advisor and argue that a learner with beliefs varying from the true prior often ends up selecting a suboptimal advisor.more » « less
-
In networked control systems, the sensory signals are often quantized before being transmitted to the controller. Consequently, performance is affected by the coarseness of this quantization process. Modern communication technologies allow users to obtain resolution-varying quantized measurements based on the prices paid. In this paper, we consider the problem of joint optimal controller synthesis and quantizer scheduling for a partially observed quantized-feedback linear-quadratic-Gaussian system, where the measurements are quantized before being sent to the controller. The system is presented with several choices of quantizers, along with the cost of using each quantizer. The objective is to jointly select the quantizers and synthesize the controller to strike an optimal balance between control performance and quantization cost. When the innovation signal is quantized instead of the measurement, the problem is decoupled into two optimization problems: one for optimal controller synthesis, and the other for optimal quantizer selection. The optimal controller is found by solving a Riccati equation and the optimal quantizer-selection policy is found by solving a linear program---both of which can be solved offline.more » « less
An official website of the United States government

