- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0001100001000001
- More
- Availability
-
31
- Author / Contributor
- Filter by Author / Creator
-
-
Kiyohara, Haruka (4)
-
Bennett, Andrew (3)
-
Chernozhukov, Victor (3)
-
Jiang, Nan (3)
-
Kallus, Nathan (3)
-
Shi, Chengchun (3)
-
Sun, Wen (3)
-
Uehara, Masatoshi (3)
-
Cao, Daniel Yiming (1)
-
Joachims, Thorsten (1)
-
Saito, Yuta (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available May 11, 2025
-
Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen (, Neural Information Processing Systems (NeurIPS 2023))We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential im- portance sampling estimators suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introduc- ing future-dependent value functions that take future proxies as inputs and perform a similar role to that of classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value under Bellman completeness, as long as futures and histories contain sufficient information about latent states.more » « less
-
Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen (, 37th Conference on Neural Information Processing Systems (NeurIPS 2023))
-
Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen (, Advances in neural information processing systems)