NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TOPS: Transition-Based Volatility-Reduced Policy Search

Liangliang, X.; Daoming, L.; Yangchen, P. (November 2022, Lecture notes in computer science)
Melo, S. F.; Fang. F. (Ed.)
Existing risk-averse reinforcement learning approaches still face several challenges, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Transition-based vOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from transitions. We prove that our algorithm—under the over-parameterized neural network regime—finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient. The convergence rate is comparable to the state-of-the-art risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of TOPS’ performance among existing risk-averse policy search methods.
more » « less
Full Text Available
Self-supervised multi-scale pyramid fusion networks for realistic bokeh effect rendering

https://doi.org/10.1016/j.jvcir.2022.103580

Wang, Zhifeng; Jiang, Aiwen; Zhang, Chunjie; Li, Hanxi; Liu, Bo (August 2022, Journal of Visual Communication and Image Representation)

Full Text Available
Principles and requirements for simulation-driven incremental learning of causal explanatory models

Levent Yilmaz (January 2022, Proceedings of the 2022 Winter Simulation Conference)

Full Text Available
Ensemble single image deraining network via progressive structural boosting constraints

https://doi.org/10.1016/j.image.2021.116460

Peng, Long; Jiang, Aiwen; Wei, Haoran; Liu, Bo; Wang, Mingwen (November 2021, Signal Processing: Image Communication)

Full Text Available
TDM: Trustworthy Decision-Making Via Interpretability Enhancement

https://doi.org/10.1109/TETCI.2021.3084290

Lyu, Daoming; Yang, Fangkai; Kwon, Hugh; Dong, Wen; Yilmaz, Levent; Liu, Bo (June 2021, IEEE Transactions on Emerging Topics in Computational Intelligence)
null (Ed.)
Full Text Available
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning.

Zhang, S. (April 2021, Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), 2021.)

Full Text Available
Model credibility revisited: Concepts and considerations for appropriate trust

https://doi.org/10.1080/17477778.2020.1821587

Yilmaz, Levent; Liu, Bo (September 2020, Journal of Simulation)
null (Ed.)
Full Text Available
Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

Zhang, S; B, Liu; Yao, H; Whiteson, S. (July 2020, International Conference on Machine Learning)

We present the first provably convergent two timescale off-policy actor-critic algorithm (COFPAC) with function approximation. Key to COFPAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (GEM), a novel combination of the key ideas of Gradient Temporal Difference Learning and Emphatic Temporal Difference Learning. With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear, and the actor can be nonlinear.
more » « less
Full Text Available
Gradientdice: Rethinking generalized offline estimation of stationary values.

Zhang, S; B, Liu; Whiteson, S. (July 2020, International Conference on Machine Learning)

We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al.,2020a), the state-of-the-art for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE’s original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence. Consequently, nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.
more » « less
Full Text Available
Learning Rule-based Explanatory Models From Exploratory Multi-simulation For Decision-support Under Uncertainty

Rodriguez, B; Yilmaz, L. (January 2020, Proceedings of the 2020 Winter Simulation Conference)

Exploratory modeling and simulation is an effective strategy when there are substantial contextual uncertainty and representational ambiguity in problem formulation. However, two significant challenges impede the use of an ensemble of models in exploratory simulation. The first challenge involves streamlining the maintenance and synthesis of multiple models from plausible features that are identified from and subject to the constraints of the research hypothesis. The second challenge is making sense of the data generated by multi-simulation over a model ensemble. To address both challenges, we introduce a computational framework that integrates feature-driven variability management with an anticipatory learning classifier system to generate explanatory rules from multi-simulation data.
more » « less
Full Text Available

« Prev Next »

Search for: All records