NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Deductive Synthesis of Reinforcement Learning Agents for Infinite Horizon Tasks

https://doi.org/10.1007/978-3-031-98685-7_4

Wang, Yuning; Zhu, He (July 2025, Springer Nature Switzerland)

Abstract We propose a deductive synthesis framework for constructing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.
more » « less
Free, publicly-accessible full text available July 23, 2026
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

https://doi.org/10.18653/v1/2025.findings-acl.983

Hua, Wenyue; Zhu, Kaijie; Li, Lingyao; Fan, Lizhou; Jin, Mingyu; Lin, Shuhang; Xue, Haochen; Li, Zelong; Wang, Jindong; Zhang, Yongfeng (October 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available October 3, 2026
Abstraction Refinement-Guided Program Synthesis for Robot Learning from Demonstrations

Cui, Guofeng; Wang, Yuning; Mao, Wensen; Duan, Yuanlin; Zhu, He (October 2025, Proc. ACM Program. Lang., Vol. 9, No. OOPSLA2, Article 292)

Over the past decade, deep reinforcement learning (RL) techniques have significantly advanced robotic systems. However, due to the complex architectures of neural network models, ensuring their trustworthiness is a considerable challenge. Programmatic reinforcement learning has surfaced as a promising approach. Nonetheless, synthesizing robot-control programs remains challenging. Existing methods rely on domain-specific languages (DSLs) populated with user-defined state abstraction predicates and a library of low-level controllers as abstract actions to boot synthesis, which is impractical in unknown environments that lack such predefined components. To address this limitation, we introduce RoboScribe, a novel abstraction refinement-guided program synthesis framework that automatically derives robot state and action abstractions from raw, unsegmented task demonstrations in high-dimensional, continuous spaces. It iteratively enriches and refines an initially coarse abstraction until it generates a task-solving program over the abstracted robot environment. RoboScribe is effective in synthesizing iterative programs by inferring recurring subroutines directly from the robot’s raw, continuous state and action spaces, without needing predefined abstractions. Experimental results show that RoboScribe programs inductively generalize to long-horizon robot tasks involving arbitrary numbers of objects, outperforming baseline methods in terms of both interpretability and efficiency.
more » « less
Free, publicly-accessible full text available October 1, 2026
Deductive Synthesis of Reinforcement Learning Agents for Infinite Horizon Tasks

Wang, Yuning; Zhu, He (July 2025, 37th International Conference on Computer Aided Verification (CAV))

We propose a deductive synthesis framework for construct- ing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.
more » « less
Free, publicly-accessible full text available July 21, 2026
Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned Models

Wang, Yuning; Zhu, He (July 2024, 36th International Conference on Computer Aided Verification (CAV))

We introduce VELM, a reinforcement learning (RL) framework grounded in verification principles for safe exploration in unknown environments. VELM ensures that an RL agent systematically explores its environment, adhering to safety properties throughout the learning process. VELM learns environment models as symbolic formulas and conducts formal reachability analysis over the learned models for safety verification. An online shielding layer is then constructed to confine the RL agent’s exploration solely within a state space verified as safe in the learned model, thereby bolstering the overall safety profile of the RL system. Our experimental results demonstrate the efficacy of VELM across diverse RL environments, highlighting its capacity to significantly reduce safety violations in comparison to existing safe learning techniques, all without compromising the RL agent’s reward performance.
more » « less
Full Text Available
Reward-Guided Synthesis of Intelligent Agents with Control Structures

https://doi.org/10.1145/3656447

Cui, Guofeng; Wang, Yuning; Qiu, Wenjie; Zhu, He (June 2024, Proceedings of the ACM on Programming Languages)

Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: •Software and its engineering → Automatic programming.
more » « less
OpenAGI: When LLM Meets Domain Experts

Ge, Yingqiang; Hua, Wenyue; Mei, Kai; Ji, Jianchao; Tan, Juntao; Xu, Shuyuan; Li, Zelong; Zhang, Yongfeng (December 2023, Advances in Neural Information Processing Systems 36 (NeurIPS 2023))

Full Text Available
Instructing Goal-Conditioned Reinforcement Learning Agents with Temporal Logic Objectives

Qiu, Wenjie; Mao, Wensen; Zhu, He (December 2023, Advances in Neural Information Processing Systems (NeurIPS))

Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this paper, we propose a novel approach to address this challenge. We show that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space. Unlike existing approaches that focus on LTL specifications expressible as regular expressions, our technique is unrestricted and generalizes to ω-regular expressions. Experiment results demonstrate the effectiveness of our approach in adapting goal-conditioned RL agents to satisfy complex temporal logic task specifications zero-shot.
more » « less
Full Text Available
How to Index Item IDs for Recommendation Foundation Models

https://doi.org/10.1145/3624918.3625339

Hua, Wenyue; Xu, Shuyuan; Ge, Yingqiang; Zhang, Yongfeng (November 2023, ACM)

Full Text Available
User-Controllable Recommendation via Counterfactual Retrospective and Prospective Explanations

Tan, Juntao; Ge, Yingqiang; Zhu, Yan; Xia, Yinglong; Luo, Jiebo; Ji, Jianchao; Zhang, Yongfeng (September 2023, Frontiers in Artificial Intelligence and Applications (ECAI 2023))

Full Text Available

« Prev Next »

Search for: All records