skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A conjecture on the Feldman bandit problem
Abstract We consider the Bernoulli bandit problem where one of the arms has win probability α and the others β, with the identity of the α arm specified by initial probabilities. With u = max(α, β), v = min(α, β), call an arm with win probability u a good arm. Whereas it is known that the strategy of always playing the arm with the largest probability of being a good arm maximizes the expected number of wins in the first n games for all n , we conjecture that it also stochastically maximizes the number of wins. That is, we conjecture that this strategy maximizes the probability of at least k wins in the first n games for all k , n . The conjecture is proven when k = 1, and k = n , and when there are only two arms and k = n - 1.  more » « less
Award ID(s):
1662442
PAR ID:
10070434
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Applied Probability
Volume:
55
Issue:
01
ISSN:
0021-9002
Page Range / eLocation ID:
318 to 324
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Consider a set of n players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than 1/2, and that our objective is to determine this best player. Under the requirement that the policy employed guarantees a correct choice with a probability of at least some specified value, we look for a policy that has a relatively small expected number of games played before decision. We consider this problem both under the assumption that the best player wins each game with a probability of at least some specified value >1/2, and under a Bayesian assumption that the probability that player i wins a game against player j is its value divided by the sum of the values, where the values are the unknown values of n independent and identically distributed exponential random variables. In the former case, we propose a policy where chosen pairs play a match that ends when one of them has had a specified number of wins more than the other; in the latter case, we propose a Thompson sampling type rule. 
    more » « less
  2. Consider a gambler who on each bet either wins 1 with probability p or loses 1 with probability q=1-p, with the results of successive bets being independent. The gambler will stop betting when they are either up k or down k. Letting N be the number of bets made, we show that N is a new better than used random variable. Moreover, we show that if k is even then N/2 has an increasing failure rate, and if k is odd then (N+1)/2 has an increasing failure rate. 
    more » « less
  3. Mixed strategies are often evaluated based on the expected payoff that they guarantee. This is not always desirable. In this paper, we consider games for which maximizing the expected payoff deviates from the actual goal of the players. To address this issue, we introduce the notion of a (u,p)-maxmin strategy which ensures receiving a minimum utility of u with probability at least p. We then give approximation algorithms for the problem of finding a (u, p)-maxmin strategy for these games. The first game that we consider is Colonel Blotto, a well-studied game that was introduced in 1921. In the Colonel Blotto game, two colonels divide their troops among a set of battlefields. Each battlefield is won by the colonel that puts more troops in it. The payoff of each colonel is the weighted number of battlefields that she wins. We show that maximizing the expected payoff of a player does not necessarily maximize her winning probability for certain applications of Colonel Blotto. For example, in presidential elections, the players’ goal is to maximize the probability of winning more than half of the votes, rather than maximizing the expected number of votes that they get. We give an exact algorithm for a natural variant of continuous version of this game. More generally, we provide constant and logarithmic approximation algorithms for finding (u, p)-maxmin strategies. We also introduce a security game version of Colonel Blotto which we call auditing game. It is played between two players, a defender and an attacker. The goal of the defender is to prevent the attacker from changing the outcome of an instance of Colonel Blotto. Again, maximizing the expected payoff of the defender is not necessarily optimal. Therefore we give a constant approximation for (u, p)-maxmin strategies. 
    more » « less
  4. null (Ed.)
    Abstract Let $$u_{k}$$ u k be a solution of the Helmholtz equation with the wave number k , $$\varDelta u_{k}+k^{2} u_{k}=0$$ Δ u k + k 2 u k = 0 , on (a small ball in) either $${\mathbb {R}}^{n}$$ R n , $${\mathbb {S}}^{n}$$ S n , or $${\mathbb {H}}^{n}$$ H n . For a fixed point p , we define $$M_{u_{k}}(r)=\max _{d(x,p)\le r}|u_{k}(x)|.$$ M u k ( r ) = max d ( x , p ) ≤ r | u k ( x ) | . The following three ball inequality $$M_{u_{k}}(2r)\le C(k,r,\alpha )M_{u_{k}}(r)^{\alpha }M_{u_{k}}(4r)^{1-\alpha }$$ M u k ( 2 r ) ≤ C ( k , r , α ) M u k ( r ) α M u k ( 4 r ) 1 - α is well known, it holds for some $$\alpha \in (0,1)$$ α ∈ ( 0 , 1 ) and $$C(k,r,\alpha )>0$$ C ( k , r , α ) > 0 independent of $$u_{k}$$ u k . We show that the constant $$C(k,r,\alpha )$$ C ( k , r , α ) grows exponentially in k (when r is fixed and small). We also compare our result with the increased stability for solutions of the Cauchy problem for the Helmholtz equation on Riemannian manifolds. 
    more » « less
  5. Boosting engagement with educational software has been promoted as a means of improving student performance. Various engagement factors have been explored, including choice, personalization, badges, bonuses, and competition. We examine two promising and relatively understudied manipulations from the realm of gambling: the nearwin effect and anticipation. The near-win effect occurs when an individual comes close to achieving a goal, e.g., getting two cherries and a lemon in a slot machine. Anticipation refers to the build-up of suspense as an outcome is revealed, e.g., revealing cherry-cherry-lemon in that order drives expectations of winning more than revealing lemon-cherrycherry. Gambling psychologists have long studied how near-wins affect engagement in pure-chance games but it is difficult to do the same in an educational context where outcomes are based on skill. In this paper, we manipulate the display of outcomes in a manner that allows us to introduce artificial near-wins largely independent of a student’s performance. In a study involving thousands of students using an online math tutor, we examine how this manipulation affects a behavioral measure of engagement—whether or not a student repeats a lesson. We find a near-win effect on engagement when the ‘win’ indicates to the student that they have attained critical competence on a lesson—the competence that allows them to continue to the next lesson. Nonetheless, when we experimentally induce near wins in a randomized controlled trial, we do not obtain a reliable effect of the near win. We discuss this mismatch of results in terms of the role of anticipation on making near wins effective. We conclude by describing manipulations that might increase the effect of near wins on engagement. 
    more » « less