<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Journal Article</dc:product_type><dc:title>Offline Reinforcement Learning with Closed-Form Policy Improvement Operators</dc:title><dc:creator>Li, Jiachen and; Zhang, Edwin; Yin, Ming; Bai, Qinxun; Wang, Yu-Xiang; Wang, William Yang</dc:creator><dc:corporate_author/><dc:editor>Krause, Andreas and; Brunskill, Emma; Cho, Kyunghyun; Engelhardt, Barbara; Sabato, Sivan; Scarlett, Jonathan</dc:editor><dc:description>Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp’s lower bound and Jensen’s Inequality, giving rise to a closed-form policy improvement operator. We instantiate both one-step and iterative offline RL algorithms with our novel policy improvement operators and empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark. Our code is available at https://cfpi-icml23.github.io/.</dc:description><dc:publisher>ICML 2023</dc:publisher><dc:date>2023-07-23</dc:date><dc:nsf_par_id>10466939</dc:nsf_par_id><dc:journal_name>Proceedings of Machine Learning Research</dc:journal_name><dc:journal_volume>202</dc:journal_volume><dc:journal_issue/><dc:page_range_or_elocation>20485--20528</dc:page_range_or_elocation><dc:issn>2640-3498</dc:issn><dc:isbn/><dc:doi>https://doi.org/</dc:doi><dcq:identifierAwardId>2007117; 2003257</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>