We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We aim to solve this problem via online distributionally robust Markov decision processes (DRMDPs), where the learning algorithm actively interacts with the source domain while seeking the optimal performance under the worst possible dynamics that is within an uncertainty set of the source domain's transition kernel. We provide the first study on online DRMDPs with function approximation for off-dynamics RL. We find that DRMDPs' dual formulation can induce nonlinearity, even when the nominal transition kernel is linear, leading to error propagation. By designing a $$d$$-rectangular uncertainty set using the total variation distance, we remove this additional nonlinearity and bypass the error propagation. We then introduce DR-LSVI-UCB, the first provably efficient online DRMDP algorithm for off-dynamics RL with function approximation, and establish a polynomial suboptimality bound that is independent of the state and action space sizes. Our work makes the first step towards a deeper understanding of the provable efficiency of online DRMDPs with linear function approximation. Finally, we substantiate the performance and robustness of DR-LSVI-UCB through different numerical experiments.
more »
« less
On Computing the Nonlinearity Interval in Parametric Semidefinite Optimization
This paper revisits the parametric analysis of semidefinite optimization problems with respect to the perturbation of the objective function along a fixed direction. We review the notions of invariancy set, nonlinearity interval, and transition point of the optimal partition, and we investigate their characterizations. We show that the set of transition points is finite and the continuity of the optimal set mapping, on the basis of Painlevé–Kuratowski set convergence, might fail on a nonlinearity interval. Under a local nonsingularity condition, we then develop a methodology, stemming from numerical algebraic geometry, to efficiently compute nonlinearity intervals and transition points of the optimal partition. Finally, we support the theoretical results by applying our procedure to some numerical examples.
more »
« less
- Award ID(s):
- 1812746
- PAR ID:
- 10356414
- Date Published:
- Journal Name:
- Mathematics of Operations Research
- ISSN:
- 0364-765X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We aim to solve this problem via online distributionally robust Markov decision processes (DRMDPs), where the learning algorithm actively interacts with the source domain while seeking the optimal performance under the worst possible dynamics that is within an uncertainty set of the source domain's transition kernel. We provide the first study on online DRMDPs with function approximation for off-dynamics RL. We find that DRMDPs' dual formulation can induce nonlinearity, even when the nominal transition kernel is linear, leading to error propagation. By designing a $$d$$-rectangular uncertainty set using the total variation distance, we remove this additional nonlinearity and bypass the error propagation. We then introduce DR-LSVI-UCB, the first provably efficient online DRMDP algorithm for off-dynamics RL with function approximation, and establish a polynomial suboptimality bound that is independent of the state and action space sizes. Our work makes the first step towards a deeper understanding of the provable efficiency of online DRMDPs with linear function approximation. Finally, we substantiate the performance and robustness of DR-LSVI-UCB through different numerical experiments.more » « less
-
Simulation models commonly describe complex systems with no closed-form analytical representation. This paper proposes an algorithm for functions on continuous domains that fits into the nested partition framework and uses quantile estimation to rank regions and identify the most promising region. Additionally, we apply the optimal computational budget allocation (OCBA) method for allocating sample points using the normality property of quantile estimators. We prove that, for functions satisfying the Lipschitz condition, the algorithm converges in probability to a region that contains the true global optimum. The paper concludes with some numerical results.more » « less
-
Let P be a set n points in a d-dimensional space. Tverberg theorem says that, if n is at least (k − 1)(d + 1), then P can be par- titioned into k sets whose convex hulls intersect. Partitions with this property are called Tverberg partitions. A partition has tolerance t if the partition remains a Tverberg partition after removal of any set of t points from P. A tolerant Tverberg partition exists in any dimensions provided that n is sufficiently large. Let N(d,k,t) be the smallest value of n such that tolerant Tverberg partitions exist for any set of n points in R d . Only few exact values of N(d,k,t) are known. In this paper, we study the problem of finding Radon partitions (Tver- berg partitions for k = 2) for a given set of points. We develop several algorithms and found new lower bounds for N(d,2,t).more » « less
-
We investigate dynamic versions of geometric set cover and hitting set where points and ranges may be inserted or deleted, and we want to efficiently maintain an (approximately) optimal solution for the current problem instance. While their static versions have been extensively studied in the past, surprisingly little is known about dynamic geometric set cover and hitting set. For instance, even for the most basic case of one-dimensional interval set cover and hitting set, no nontrivial results were known. The main contribution of our paper are two frameworks that lead to efficient data structures.more » « less
An official website of the United States government

