skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Title: Treatment choice, mean square regret and partial identification
Abstract We consider a decision maker who faces a binary treatment choice when their welfare is only partially identified from data. We contribute to the literature by anchoring our finite-sample analysis on mean square regret, a decision criterion advocated by Kitagawa et al. in (2022) Treatment Choice with Nonlinear Regret . We find that optimal rules are always fractional, irrespective of the width of the identified set and precision of its estimate. The optimal treatment fraction is a simple logistic transformation of the commonly used t-statistic multiplied by a factor calculated by a simple constrained optimization. This treatment fraction gets closer to 0.5 as the width of the identified set becomes wider, implying the decision maker becomes more cautious against the adversarial Nature.  more » « less
Award ID(s):
2315600
PAR ID:
10517661
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
The Japanese Economic Review
Volume:
74
Issue:
4
ISSN:
1352-4739
Page Range / eLocation ID:
573 to 602
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We derive asymptotically optimal statistical decision rules for discrete choice problems when payoffs depend on a partially-identified parameter θ and the decision maker can use a point-identified parameter μ to deduce restrictions on θ. Examples include treatment choice under partial identification and pricing with rich unobserved heterogeneity. Our notion of optimality combines a minimax approach to handle the ambiguity from partial identification of θ given μ with an average risk minimization approach for μ. We show how to implement optimal decision rules using the bootstrap and (quasi-)Bayesian methods in both parametric and semiparametric settings. We provide detailed applications to treatment choice and optimal pricing. Our asymptotic approach is well suited for realistic empirical settings in which the derivation of finite-sample optimal rules is intractable. 
    more » « less
  2. Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting. 
    more » « less
  3. This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich’s OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environ- ments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained con- vex optimization. To solve this problem, this paper proposes a new algorithm that achieves O(√T ) expected regret and constraint violations and O(√T log(T )) high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm. 
    more » « less
  4. This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich’s OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environ- ments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained con- vex optimization. To solve this problem, this paper proposes a new algorithm that achieves O(√T ) expected regret and constraint violations and O(√T log(T )) high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm. 
    more » « less
  5. null (Ed.)
    In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of T submodular functions over a common finite ground set U arrives online, and at each time-step the decision maker must choose at most k elements of U before observing the function. The decision maker obtains a profit equal to the function evaluated on the chosen set and aims to learn a sequence of sets that achieves low expected regret. In the full-information setting, we develop an (𝜀,𝛿)-DP algorithm with expected (1-1/e)-regret bound of 𝑂(𝑘2log|𝑈|𝑇log𝑘/𝛿√𝜀). This algorithm contains k ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an (𝜀,𝛿+𝑂(𝑒−𝑇1/3))-DP algorithm with expected (1-1/e)-regret bound of 𝑂(log𝑘/𝛿√𝜀(𝑘(|𝑈|log|𝑈|)1/3)2𝑇2/3). One challenge for privacy in this setting is that the payoff and feedback of expert i depends on the actions taken by her i-1 predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest. 
    more » « less