NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Abstraction Refinement-Guided Program Synthesis for Robot Learning from Demonstrations

Cui, Guofeng; Wang, Yuning; Mao, Wensen; Duan, Yuanlin; Zhu, He (October 2025, Proc. ACM Program. Lang., Vol. 9, No. OOPSLA2, Article 292)

Over the past decade, deep reinforcement learning (RL) techniques have significantly advanced robotic systems. However, due to the complex architectures of neural network models, ensuring their trustworthiness is a considerable challenge. Programmatic reinforcement learning has surfaced as a promising approach. Nonetheless, synthesizing robot-control programs remains challenging. Existing methods rely on domain-specific languages (DSLs) populated with user-defined state abstraction predicates and a library of low-level controllers as abstract actions to boot synthesis, which is impractical in unknown environments that lack such predefined components. To address this limitation, we introduce RoboScribe, a novel abstraction refinement-guided program synthesis framework that automatically derives robot state and action abstractions from raw, unsegmented task demonstrations in high-dimensional, continuous spaces. It iteratively enriches and refines an initially coarse abstraction until it generates a task-solving program over the abstracted robot environment. RoboScribe is effective in synthesizing iterative programs by inferring recurring subroutines directly from the robot’s raw, continuous state and action spaces, without needing predefined abstractions. Experimental results show that RoboScribe programs inductively generalize to long-horizon robot tasks involving arbitrary numbers of objects, outperforming baseline methods in terms of both interpretability and efficiency.
more » « less
Free, publicly-accessible full text available October 1, 2026
Deductive Synthesis of Reinforcement Learning Agents for Infinite Horizon Tasks

Wang, Yuning; Zhu, He (July 2025, 37th International Conference on Computer Aided Verification (CAV))

We propose a deductive synthesis framework for construct- ing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.
more » « less
Free, publicly-accessible full text available July 21, 2026
Deductive Synthesis of Reinforcement Learning Agents for Infinite Horizon Tasks

https://doi.org/10.1007/978-3-031-98685-7_4

Wang, Yuning; Zhu, He (July 2025, Springer Nature Switzerland)

Abstract We propose a deductive synthesis framework for constructing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.
more » « less
Free, publicly-accessible full text available July 23, 2026
Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned Models

Wang, Yuning; Zhu, He (July 2024, 36th International Conference on Computer Aided Verification (CAV))

We introduce VELM, a reinforcement learning (RL) framework grounded in verification principles for safe exploration in unknown environments. VELM ensures that an RL agent systematically explores its environment, adhering to safety properties throughout the learning process. VELM learns environment models as symbolic formulas and conducts formal reachability analysis over the learned models for safety verification. An online shielding layer is then constructed to confine the RL agent’s exploration solely within a state space verified as safe in the learned model, thereby bolstering the overall safety profile of the RL system. Our experimental results demonstrate the efficacy of VELM across diverse RL environments, highlighting its capacity to significantly reduce safety violations in comparison to existing safe learning techniques, all without compromising the RL agent’s reward performance.
more » « less
Full Text Available
Reward-Guided Synthesis of Intelligent Agents with Control Structures

https://doi.org/10.1145/3656447

Cui, Guofeng; Wang, Yuning; Qiu, Wenjie; Zhu, He (June 2024, Proceedings of the ACM on Programming Languages)

Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: •Software and its engineering → Automatic programming.
more » « less
Instructing Goal-Conditioned Reinforcement Learning Agents with Temporal Logic Objectives

Qiu, Wenjie; Mao, Wensen; Zhu, He (December 2023, Advances in Neural Information Processing Systems (NeurIPS))

Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this paper, we propose a novel approach to address this challenge. We show that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space. Unlike existing approaches that focus on LTL specifications expressible as regular expressions, our technique is unrestricted and generalizes to ω-regular expressions. Experiment results demonstrate the effectiveness of our approach in adapting goal-conditioned RL agents to satisfy complex temporal logic task specifications zero-shot.
more » « less
Full Text Available
Bacterial Community Structure Modulates Soil Phosphorus Turnover at Early Stages of Primary Succession

https://doi.org/10.1029/2024GB008174

Wang, Yuhan; Bing, Haijian; Moorhead, Daryl L; Hou, Enqing; Wu, Yanhong; Wang, Jipeng; Duan, Chengjiao; Cui, Qingliang; Zhang, Zhiqin; Zhu, He; et al (October 2024, Global Biogeochemical Cycles)

Abstract Microbes are the drivers of soil phosphorus (P) cycling in terrestrial ecosystems; however, the role of soil microbes in mediating P cycling in P‐rich soils during primary succession remains uncertain. This study examined the impacts of bacterial community structure (diversity and composition) and its functional potential (absolute abundances of P‐cycling functional genes) on soil P cycling along a 130‐year glacial chronosequence on the eastern Tibetan Plateau. Bacterial community structure was a better predictor of soil P fractions than P‐cycling genes along the chronosequence. After glacier retreat, the solubilization of inorganic P and the mineralization of organic P were significantly enhanced by increased bacterial diversity, changed interspecific interactions, and abundant species involved in soil P mineralization, thereby increasing P availability. Although 84% of P‐cycling genes were associated with organic P mineralization, these genes were more closely associated with soil organic carbon than with organic P. Bacterial carbon demand probably determined soil P turnover, indicating the dominant role of organic matter decomposition processes in P‐rich alpine soils. Moreover, the significant decrease in the complexity of the bacterial co‐occurrence network and the taxa‐gene‐P network at the later stage indicates a declining dominance of the bacterial community in driving soil P cycling with succession. Our results reveal that bacteria with a complex community structure have a prominent potential for biogeochemical P cycling in P‐rich soils during the early stages of primary succession.
more » « less
Full Text Available
A life cycle and product type based estimator for quantifying the carbon stored in wood products

https://doi.org/10.1186/s13021-022-00220-y

Wei, Xinyuan; Zhao, Jianheng; Hayes, Daniel J; Daigneault, Adam; Zhu, He (December 2023, Carbon Balance and Management)

Abstract BackgroundTimber harvesting and industrial wood processing laterally transfer the carbon stored in forest sectors to wood products creating a wood products carbon pool. The carbon stored in wood products is allocated to end-use wood products (e.g., paper, furniture), landfill, and charcoal. Wood products can store substantial amounts of carbon and contribute to the mitigation of greenhouse effects. Therefore, accurate accounts for the size of wood products carbon pools for different regions are essential to estimating the land-atmosphere carbon exchange by using the bottom-up approach of carbon stock change. ResultsTo quantify the carbon stored in wood products, we developed a state-of-the-art estimator (Wood Products Carbon Storage Estimator, WPsCS Estimator) that includes the wood products disposal, recycling, and waste wood decomposition processes. The wood products carbon pool in this estimator has three subpools: (1) end-use wood products, (2) landfill, and (3) charcoal carbon. In addition, it has a user-friendly interface, which can be used to easily parameterize and calibrate an estimation. To evaluate its performance, we applied this estimator to account for the carbon stored in wood products made from the timber harvested in Maine, USA, and the carbon storage of wood products consumed in the United States. ConclusionThe WPsCS Estimator can efficiently and easily quantify the carbon stored in harvested wood products for a given region over a specific period, which was demonstrated with two illustrative examples. In addition, WPsCS Estimator has a user-friendly interface, and all parameters can be easily modified.
more » « less
Full Text Available
Verification-guided Programmatic Controller Synthesis

https://doi.org/10.1007/978-3-031-30820-8_16

Wang, Yuning; Zhu, He (April 2023, ools and Algorithms for the Construction and Analysis of Systems. TACAS 2023. LNCS)

We present a verification-based learning framework VEL that synthesizes safe programmatic controllers for environments with continuous state and action spaces. The key idea is the integration of program reasoning techniques into controller training loops. VEL performs abstraction-based program verification to reason about a programmatic controller and its environment as a closed-loop system. Based on a novel verification-guided synthesis loop for training, VEL minimizes the amount of safety violation in the proof space of the system, which approximates the worst-case safety loss, using gradient-descent style optimization. Experimental results demonstrate the substantial benefits of leveraging verification feedback for synthesizing provably correct controllers.
more » « less
Full Text Available
Learn Basic Skills and Reuse: Modularized Adaptive Neural Architecture Search (MANAS)

https://doi.org/10.1145/3511808.3557385

Chen, Hanxiong; Li, Yunqi; Zhu, He; Zhang, Yongfeng (October 2022, CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management)

Full Text Available

« Prev Next »

Search for: All records