skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Treatment choice, mean square regret and partial identification
Abstract We consider a decision maker who faces a binary treatment choice when their welfare is only partially identified from data. We contribute to the literature by anchoring our finite-sample analysis on mean square regret, a decision criterion advocated by Kitagawa et al. in (2022) Treatment Choice with Nonlinear Regret . We find that optimal rules are always fractional, irrespective of the width of the identified set and precision of its estimate. The optimal treatment fraction is a simple logistic transformation of the commonly used t-statistic multiplied by a factor calculated by a simple constrained optimization. This treatment fraction gets closer to 0.5 as the width of the identified set becomes wider, implying the decision maker becomes more cautious against the adversarial Nature.  more » « less
Award ID(s):
2315600
PAR ID:
10517661
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
The Japanese Economic Review
Volume:
74
Issue:
4
ISSN:
1352-4739
Page Range / eLocation ID:
573 to 602
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting. 
    more » « less
  2. This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich’s OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environ- ments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained con- vex optimization. To solve this problem, this paper proposes a new algorithm that achieves O(√T ) expected regret and constraint violations and O(√T log(T )) high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm. 
    more » « less
  3. This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich’s OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environ- ments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained con- vex optimization. To solve this problem, this paper proposes a new algorithm that achieves O(√T ) expected regret and constraint violations and O(√T log(T )) high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm. 
    more » « less
  4. null (Ed.)
    In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of T submodular functions over a common finite ground set U arrives online, and at each time-step the decision maker must choose at most k elements of U before observing the function. The decision maker obtains a profit equal to the function evaluated on the chosen set and aims to learn a sequence of sets that achieves low expected regret. In the full-information setting, we develop an (𝜀,𝛿)-DP algorithm with expected (1-1/e)-regret bound of 𝑂(𝑘2log|𝑈|𝑇log𝑘/𝛿√𝜀). This algorithm contains k ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an (𝜀,𝛿+𝑂(𝑒−𝑇1/3))-DP algorithm with expected (1-1/e)-regret bound of 𝑂(log𝑘/𝛿√𝜀(𝑘(|𝑈|log|𝑈|)1/3)2𝑇2/3). One challenge for privacy in this setting is that the payoff and feedback of expert i depends on the actions taken by her i-1 predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest. 
    more » « less
  5. Abstract Understanding why animals (including humans) choose one thing over another is one of the key questions underlying the fields of behavioural ecology, behavioural economics and psychology. Most traditional studies of food choice in animals focus on simple, single‐attribute decision tasks. However, animals in the wild are often faced with multi‐attribute choice tasks where options in the choice set vary across multiple dimensions. Multi‐attribute decision‐making is particularly relevant for flower‐visiting insects faced with deciding between flowers that may differ in reward attributes such as sugar concentration, nectar volume and pollen composition as well as non‐rewarding attributes such as colour, symmetry and odour. How do flower‐visiting insects deal with complex multi‐attribute decision tasks?Here we review and synthesise research on the decision strategies used by flower‐visiting insects when making multi‐attribute decisions. In particular, we review how different types of foraging frameworks (classic optimal foraging theory, nutritional ecology, heuristics) conceptualise multi‐attribute choice and we discuss how phenomena such as innate preferences, flower constancy and context dependence influence our understanding of flower choice.We find that multi‐attribute decision‐making is a complex process that can be influenced by innate preferences, flower constancy, the composition of the choice set and economic reward value. We argue that to understand and predict flower choice in flower‐visiting insects, we need to move beyond simplified choice sets towards a view of multi‐attribute choice which integrates the role of non‐rewarding attributes and which includes flower constancy, innate preferences and context dependence. We further caution that behavioural experiments need to consider the possibility of context dependence in the design and interpretation of preference experiments.We conclude with a discussion of outstanding questions for future research. We also present a conceptual framework that incorporates the multiple dimensions of choice behaviour. 
    more » « less