skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Distributionally Robust Behavioral Cloning for Robust Imitation Learning
Award ID(s):
2045783
PAR ID:
10501612
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-0124-3
Page Range / eLocation ID:
1342 to 1347
Format(s):
Medium: X
Location:
Singapore, Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
  2. Abstract We introduce an arbitrage‐free framework for robust valuation adjustments. An investor trades a credit default swap portfolio with a risky counterparty, and hedges credit risk by taking a position in defaultable bonds. The investor does not know the exact return rate of her counterparty's bond, but she knows it lies within an uncertainty interval. We derive both upper and lower bounds for the XVA process of the portfolio, and show that these bounds may be recovered as solutions of nonlinear ordinary differential equations. The presence of collateralization and closeout payoffs leads to important differences with respect to classical credit risk valuation. The value of the super‐replicating portfolio cannot be directly obtained by plugging one of the extremes of the uncertainty interval in the valuation equation, but rather depends on the relation between the XVA replicating portfolio and the closeout value throughout the life of the transaction. Our comparative statics analysis indicates that credit contagion has a nonlinear effect on the replication strategies and on the XVA. 
    more » « less
  3. Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts — discrepancies between the data-generating environment and that where policies are deployed. Si et al., (2020) proposed distributionally robust OPE/L (DROPE/L) to address this, but the proposal relies on inverse-propensity weighting, whose estimation error and regret will deteriorate if propensities are nonparametrically estimated and whose variance is suboptimal even if not. For standard, non-robust, OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR2OPE) and show that it achieves semiparametric efficiency under weak product rates conditions. Thanks to a localization technique, LDR2 OPE only requires fitting a small number of regressions, just like DR methods for standard OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR2OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of 𝑂(𝑁−1/2) even when unknown propensities are nonparametrically estimated. We empirically validate our algorithms in simulations and further extend our results to general 𝑓-divergence uncertainty sets. 
    more » « less