skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-Armed Bandits for Human-Machine Decision Making
Building an integrated human-machine decision-making system requires developing effective interfaces between the human and the machine. We develop such an interface by studying the multi-armed bandit problem, a simple sequential decision-making paradigm that can model a variety of tasks. We construct Bayesian algorithms for the multi-armed ban- dit problem, prove conditions under which these algorithms achieve good performance, and empirically show that, with appropriate priors, these algorithms effectively model human choice behavior; the priors then form a principled interface from human to machine. We take a signal processing perspective on the prior estimation problem and develop methods to estimate the priors given human choice data.  more » « less
Award ID(s):
1734272
PAR ID:
10108283
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
6986 to 6990
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and zeta-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information. 
    more » « less
  2. We study distributed cooperative decision-making in a multi-agent stochastic multi-armed bandit (MAB) problem in which agents are connected through an undirected graph and observe the actions and rewards of their neighbors. We develop a novel policy based on partitions of the communication graph and propose a distributed method for selecting an arbitrary number of leaders and partitions. We analyze this new policy and evaluate its performance using Monte-Carlo simulations. 
    more » « less
  3. We extend the classic multi-armed bandit (MAB) model to the setting of noncompliance, where the arm pull is a mere instrument and the treatment applied may differ from it, which gives rise to the instrument-armed bandit (IAB) problem. The IAB setting is relevant whenever the experimental units are human since free will, ethics, and the law may prohibit unrestricted or forced application of treatment. In particular, the setting is relevant in bandit models of dynamic clinical trials and other controlled trials on human interventions. Nonetheless, the setting has not been fully investigate in the bandit literature. We show that there are various and divergent notions of regret in this setting, all of which coincide only in the classic MAB setting. We characterize the behavior of these regrets and analyze standard MAB algorithms. We argue for a particular kind of regret that captures the causal effect of treatments but show that standard MAB algorithms cannot achieve sublinear control on this regret. Instead, we develop new algorithms for the IAB problem, prove new regret bounds for them, and compare them to standard MAB algorithms in numerical examples. 
    more » « less
  4. This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making. This CRB framework incorporates the core features of contextual bandits and restless bandits, so that it can model both the internal state transitions of each arm and the influence of external global environmental contexts. Using the dual decomposition method, we develop a scalable index policy algorithm for solving the CRB problem, and theoretically analyze the asymptotical optimality of this algorithm. In the case when the arm models are unknown, we further propose a model-based online learning algorithm based on the index policy to learn the arm models and make decisions simultaneously. Furthermore, we apply the proposed CRB framework and the index policy algorithm specifically to the demand response decision-making problem in smart grids. The numerical simulations demonstrate the performance and efficiency of our proposed CRB approaches. 
    more » « less
  5. Predicting and understanding how people make decisions has been a long-standing goal in many fields, with quantitative models of human decision-making informing research in both the social sciences and engineering. We show how progress toward this goal can be accelerated by using large datasets to power machine-learning algorithms that are constrained to produce interpretable psychological theories. Conducting the largest experiment on risky choice to date and analyzing the results using gradient-based optimization of differentiable decision theories implemented through artificial neural networks, we were able to recapitulate historical discoveries, establish that there is room to improve on existing theories, and discover a new, more accurate model of human decision-making in a form that preserves the insights from centuries of research. 
    more » « less