Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available June 1, 2025
-
Free, publicly-accessible full text available March 14, 2025
-
Free, publicly-accessible full text available July 1, 2025
-
Free, publicly-accessible full text available March 14, 2025
-
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived assuming a perfect Markov decision process (MDP) model. In “Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes,” A. Bennett and N. Kallus tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, they consider estimating the value of a given target policy in an unknown POMDP, given observations of trajectories generated by a different and unknown policy, which may depend on the unobserved states. They consider both when the target policy value can be identified the observed data and, given identification, how best to estimate it. Both these problems are addressed by extending the framework of proximal causal inference to POMDP settings, using sequences of so-called bridge functions. This results in a novel framework for off-policy evaluation in POMDPs that they term proximal reinforcement learning, which they validate in various empirical settings.more » « less
-
Free, publicly-accessible full text available December 8, 2024
-
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential im- portance sampling estimators suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introduc- ing future-dependent value functions that take future proxies as inputs and perform a similar role to that of classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value under Bellman completeness, as long as futures and histories contain sufficient information about latent states.more » « lessFree, publicly-accessible full text available December 8, 2024
-
Free, publicly-accessible full text available December 1, 2024
-
This paper shares four Sea Grant-funded projects from across the United States. The Hawai‘i project integrates Western science and Hawaiian culture in place- and community-based teaching. The Maryland program takes a project-based learning approach to aquaculture education in the formal education system. The Massachusetts (MIT) project focuses on state-of-the-art technology in engineering, robotics, and ocean science. The Virginia project emphasizes science communication and lesson plan design. What all four projects have in common is their focus on environmental literacy and teacher professional development in formal education. This approach aims to raise the quality of STEM instruction by expanding teachers’ knowledge, skills, and resources. Training teachers also efficiently utilizes resources by maximizing the number of students we ultimately reach, thereby creating sustainability.
Free, publicly-accessible full text available January 1, 2025 -
Abstract The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. We introduce a very general class of estimators called the variational method of moments (VMM), motivated by a variational minimax reformulation of optimally weighted generalized method of moments for finite sets of moments. VMM controls infinitely for many moments characterized by flexible function classes such as neural nets and kernel methods, while provably maintaining statistical efficiency unlike existing related minimax estimators. We also develop inference algorithms and demonstrate the empirical strengths of VMM estimation and inference in experiments.