NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Model-Based Diffusion for Trajectory Optimization

Pan, Chaoyi; Yi, Zeji; Shi, Guanya; Qu, Guannan (December 2024, Neural Information Processing Systems 2024)

Free, publicly-accessible full text available December 9, 2025
Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

Deweese, Alex; Qu, Guannan (July 2024, Proceedings of the 41st International Conference on Machine Learning, PMLR 235)

Full Text Available
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Jali, Neharika; Qu, Guannan; Wang, Weina; Joshi, Gauri (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient-based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ∼30 over the greedy policy that routes to the fastest available server.
more » « less
Full Text Available
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Jali, Neharika; Qu, Guannan; Wang, Weina; Joshi, Gauri (March 2024, Artificial Intelligence and Statistics 2024)
Decentralized graph-based multi-agent reinforcement learning using reward machines

https://doi.org/10.1016/j.neucom.2023.126974

Hu, Jueming; Xu, Zhe; Wang, Weichang; Qu, Guannan; Pang, Yutian; Liu, Yongming (January 2024, Neurocomputing)

In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.
more » « less
Full Text Available
Stability Constrained Reinforcement Learning for Decentralized Real-Time Voltage Control

https://doi.org/10.1109/TCNS.2023.3338240

Feng, Jie; Shi, Yuanyuan; Qu, Guannan; Low, Steven H.; Anandkumar, Anima; Wierman, Adam (January 2024, IEEE Transactions on Control of Network Systems)

Full Text Available
KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed Stability in Nonlinear Discrete-Time Systems

https://doi.org/10.1109/CDC49753.2023.10384011

Lale, Sahin; Shi, Yuanyuan; Qu, Guannan; Azizzadenesheli, Kamyar; Wierman, Adam; Anandkumar, Anima (December 2023, 2023 62nd IEEE Conference on Decision and Control (CDC))

Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, the standard reinforcement learning (RL) methods lack formal stabilization guarantees, which limits their applicability for the control of real-world dynamical systems. We propose a novel policy optimization method that adopts Krasovskii's family of Lyapunov functions as a stability constraint. We show that solving this stability-constrained optimization problem using a primal-dual approach recovers a stabilizing policy for the underlying system even under modeling error. Combining this method with model learning, we propose a model-based RL framework with formal stability guarantees, Krasovskii-Constrained Reinforcement Learning (KCRL). We theoretically study KCRL with kernel-based feature representation in model learning and provide a sample complexity guarantee to learn a stabilizing controller for the underlying system. Further, we empirically demonstrate the effectiveness of KCRL in learning stabilizing policies in online voltage control of a distributed power system. We show that KCRL stabilizes the system under various real-world solar and electricity demand profiles, whereas standard RL methods often fail to stabilize.
more » « less
Full Text Available
Compositional Neural Certificates for Networked Dynamical Systems

Zhang, Songyuan; Xiu, Yumeng; Qu, Guannan; Fan, Chuchu (June 2023, Proceedings of Machine Learning Research)

Full Text Available
Reinforcement Learning for Selective Key Applications in Power Systems: Recent Advances and Future Challenges

https://doi.org/10.1109/TSG.2022.3154718

Chen, Xin; Qu, Guannan; Tang, Yujie; Low, Steven; Li, Na (July 2022, IEEE Transactions on Smart Grid)

Full Text Available
Stability Constrained Reinforcement Learning for Real-Time Voltage Control

https://doi.org/10.23919/ACC53348.2022.9867476

Shi, Yuanyuan; Qu, Guannan; Low, Steven; Anandkumar, Anima; Wierman, Adam (June 2022, 2022 American Control Conference (ACC))

Full Text Available

« Prev Next »

Search for: All records