skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Q-Learning Approach for Adherence-Aware Recommendations
In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an “adherence-aware Q-learning” algorithm to address this problem. The algorithm learns the “adherence level” that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation law in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios.  more » « less
Award ID(s):
2401007 2348381
PAR ID:
10508513
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Control Systems Letters
Volume:
7
ISSN:
2475-1456
Page Range / eLocation ID:
3645 to 3650
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
    Reinforcement learning (RL) has demonstrated remarkable achievements in simulated environments. However, carrying this success to real environments requires the important attribute of robustness, which the existing RL algorithms often lack as they assume that the future deployment environment is the same as the training environment (i.e. simulator) in which the policy is learned. This assumption often does not hold due to the discrepancy between the simulator and the real environment and, as a result, and hence renders the learned policy fragile when deployed. In this paper, we propose a novel distributionally robust Q-learning algorithm that learns the best policy in the worst distributional perturbation of the environment. Our algorithm first transforms the infinite-dimensional learning problem (since the environment MDP perturbation lies in an infinite-dimensional space) into a finite-dimensional dual problem and subsequently uses a multi-level Monte-Carlo scheme to approximate the dual value using samples from the simulator. Despite the complexity, we show that the resulting distributionally robust Q-learning algorithm asymptotically converges to optimal worst-case policy, thus making it robust to future environment changes. Simulation results further demonstrate its strong empirical robustness. 
    more » « less
  2. Abstract We demonstrate that the key components of cognitive architectures (declarative and procedural memory) and their key capabilities (learning, memory retrieval, probability judgment, and utility estimation) can be implemented as algebraic operations on vectors and tensors in a high‐dimensional space using a distributional semantics model. High‐dimensional vector spaces underlie the success of modern machine learning techniques based on deep learning. However, while neural networks have an impressive ability to process data to find patterns, they do not typically model high‐level cognition, and it is often unclear how they work. Symbolic cognitive architectures can capture the complexities of high‐level cognition and provide human‐readable, explainable models, but scale poorly to naturalistic, non‐symbolic, or big data. Vector‐symbolic architectures, where symbols are represented as vectors, bridge the gap between the two approaches. We posit that cognitive architectures, if implemented in a vector‐space model, represent a useful, explanatory model of the internal representations of otherwise opaque neural architectures. Our proposed model, Holographic Declarative Memory (HDM), is a vector‐space model based on distributional semantics. HDM accounts for primacy and recency effects in free recall, the fan effect in recognition, probability judgments, and human performance on an iterated decision task. HDM provides a flexible, scalable alternative to symbolic cognitive architectures at a level of description that bridges symbolic, quantum, and neural models of cognition. 
    more » « less
  3. We study transfer learning for estimation in latent variable network models. In our setting, the conditional edge probability matrices given the latent variables are represented by P for the source and Q for the target. We wish to estimate Q given two kinds of data: (1) edge data from a subgraph induced by an o(1) fraction of the nodes of Q, and (2) edge data from all of P. If the source P has no relation to the target Q, the estimation error must be Ω(1). However, we show that if the latent variables are shared, then vanishing error is possible. We give an efficient algorithm that utilizes the ordering of a suitably defined graph distance. Our algorithm achieves o(1) error and does not assume a parametric form on the source or target networks. Next, for the specific case of Stochastic Block Models we prove a minimax lower bound and show that a simple algorithm achieves this rate. Finally, we empirically demonstrate our algorithm's use on real-world and simulated graph transfer problems. 
    more » « less
  4. Recent years have witnessed the superior performance of heterogeneous graph neural networks (HGNNs) in dealing with heterogeneous information networks (HINs). Nonetheless, the success of HGNNs often depends on the availability of sufficient labeled training data, which can be very expensive to obtain in real scenarios. Active learning provides an effective solution to tackle the data scarcity challenge. For the vast majority of the existing work regarding active learning on graphs, they mainly focus on homogeneous graphs, and thus fall in short or even become inapplicable on HINs. In this paper, we study the active learning problem with HGNNs and propose a novel meta-reinforced active learning framework MetRA. Previous reinforced active learning algorithms train the policy network on labeled source graphs and directly transfer the policy to the target graph without any adaptation. To better exploit the information from the target graph in the adaptation phase, we propose a novel policy transfer algorithm based on meta-Q-learning termed per-step MQL. Empirical evaluations on HINs demonstrate the effectiveness of our proposed framework. The improvement over the best baseline is up to 7% in Micro-F1. 
    more » « less
  5. null (Ed.)
    We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov decision process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as Q-learning and actor-critic algorithms, this algorithm does not depend on a stochastic approximation-based method. We show that our algorithm, which we call the empirical Q-value iteration algorithm, converges to the optimal Q-value function. We also give a rate of convergence or a nonasymptotic sample complexity bound and show that an asynchronous (or online) version of the algorithm will also work. Preliminary experimental results suggest a faster rate of convergence to a ballpark estimate for our algorithm compared with stochastic approximation-based algorithms. 
    more » « less