Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Chaudhuri, Kamalika and (Ed.)We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a problem wellmotivated by reallife RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stagewise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$. This is an exponential improvement over the bestknown switching cost $O(H^2SA\log T)$ among existing methods with $\widetilde{O}(\mathrm{poly}(H,S,A)\sqrt{T})$ regret. In the above, $S,A$ denotes the number of states and actions in anmore »Free, publiclyaccessible full text available June 1, 2023

Goaloriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in realworld applications. Its theoretical formulation, stochastic shortest path (SSP), has been intensively researched in the online setting. Nevertheless, it remains understudied when such an online interaction is prohibited and only historical data is provided. In this paper, we consider the offline stochastic shortest path problem when the state space and the action space are finite. We design the simple value iterationbased algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Notably, our analysis ofmore »Free, publiclyaccessible full text available May 1, 2023

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decisionmaking strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators can help mitigate the sample complexity burden in modern reinforcement learning problems, existing endeavors usually enforce powerful function representation models (e.g. neural networks) to learn the optimal policies. However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear. Towards this goal, we study the statistical limits of offline reinforcement learning with linear model representations. To derive the tight offline learning bound,more »Free, publiclyaccessible full text available April 1, 2023

We consider how to privately share the personalized privacy losses incurred by objective perturbation, using perinstance differential privacy (pDP). Standard differential privacy (DP) gives us a worstcase bound that might be orders of magnitude larger than the privacy loss to a particular individual relative to a fixed dataset. The pDP framework provides a more finegrained analysis of the privacy guarantee to a target individual, but the perinstance privacy loss itself might be a function of sensitive data. In this paper, we analyze the perinstance privacy loss of releasing a private empirical risk minimizer learned via objective perturbation, and propose amore »Free, publiclyaccessible full text available December 1, 2022

We study the \emph{offline reinforcement learning} (offline RL) problem, where the goal is to learn a rewardmaximizing policy in an unknown \emph{Markov Decision Process} (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for the finite horizon MDPs. Prior works derive the informationtheoretical lower bounds based on different datacoverage assumptions and their upper bounds are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the \emph{Adaptive Pessimistic Value Iteration} (APVI) algorithm and derive the suboptimality upper bound that nearly matchesmore »Free, publiclyaccessible full text available December 1, 2022

We consider the problem of offline reinforcement learning (RL)  a wellmotivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose OffPolicy Double Variance Reduction (OPDVR), a new variance reduction based algorithm for offline RL. Our main result shows that OPDVR provably identifies an ϵoptimal policy with O˜(H2/dmϵ2) episodes of offline data in the finitehorizon stationary transition setting, where H is the horizonmore »Free, publiclyaccessible full text available December 1, 2022

Krause, Andreas (Ed.)The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning. Existing theoretical analysis shows that PATE consistently learns any VCclasses in the realizable setting, but falls short in explaining its success in more general cases where the error rate of the optimal classifier is bounded away from zero. We fill in this gap by introducing the Tsybakov Noise Condition (TNC) and establish stronger and more interpretable learning bounds. These bounds provide new insights into when PATE works and improve over existing results even in the narrower realizable setting. We alsomore »Free, publiclyaccessible full text available November 22, 2022

This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with modelbased methods (for episodic MDP) and provides a unified framework towards optimal learning for several wellmotivated offline tasks. Uniform OPE supΠQπ−Q̂ π<ϵ is a stronger measure than the pointwise OPE and ensures offline learning when Π contains all policies (the global class). In this paper, we establish an Ω(H2S/dmϵ2) lower bound (over modelbased family) for the global uniform OPE and our main result establishes an upper bound of Õ (H2/dmϵ2) for the \emph{local} uniform convergence that applies to all \emph{nearempirically optimal} policies for themore »Free, publiclyaccessible full text available October 1, 2022

Free, publiclyaccessible full text available August 15, 2022

COVID19 pandemic has an unprecedented impact all over the world since early 2020. During this public health crisis, reliable forecasting of the disease becomes critical for resource allocation and administrative planning. The results from compartmental models such as SIR and SEIR are popularly referred by CDC and news media. With more and more COVID19 data becoming available, we examine the following question: Can a direct datadriven approach without modeling the disease spreading dynamics outperform the well referred compartmental models and their variants? In this paper, we show the possibility. It is observed that as COVID19 spreads at different speed andmore »