Near-Optimal Differentially Private Reinforcement Learning

Qiao, Dan; Wang, Yu-Xiang

Citation Details

Motivated by personalized healthcare and other applications involving sensitive data, we study online exploration in reinforcement learning with differential privacy (DP) constraints. Existing work on this problem established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) but did not provide an algorithm with optimal regret. We close this gap for the JDP case by designing an $\epsilon$-JDP algorithm with a regret of $\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the information-theoretic lower bound of non-private learning for all choices of $\epsilon> S^{1.5}A^{0.5} H^2/\sqrt{T}$. In the above, $S$, $A$ denote the number of states and actions, $H$ denotes the planning horizon, and $T$ is the number of steps. To the best of our knowledge, this is the first private RL algorithm that achieves privacy for free asymptotically as $T\rightarrow \infty$. Our techniques — which could be of independent interest — include privately releasing Bernstein-type exploration bonuses and an improved method for releasing visitation statistics. The same techniques also imply a slightly improved regret bound for the LDP case. more »

Award ID(s):: 2007117

NSF-PAR ID:: 10466932

Author(s) / Creator(s):: Qiao, Dan; Wang, Yu-Xiang

Editor(s):: Ruiz, Francisco and

Publisher / Repository:: PMLR

Date Published:: 2023-04-25

Journal Name:: Proceedings of Machine Learning Research

Volume:: 206

ISSN:: 2640-3498

Page Range / eLocation ID:: 9914--9940

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this